unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#39885: Bioconductor URI, fallback and time-machine
@ 2020-03-03 15:59 zimoun
  2020-03-23 21:20 ` Ricardo Wurmus
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: zimoun @ 2020-03-03 15:59 UTC (permalink / raw)
  To: 39885, me, rekado

[-- Attachment #1: Type: text/plain, Size: 1188 bytes --]

Dear,

Currently, the URI scheme (see 'bioconductor-uri' in
guix/build-system/r.scm) is:

 https://bioconductor.org/packages/release/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz

which leads to 2 issues:

 1. when Bioconductor updates their release, some package versions are
updated too, and so, the upstream return 404.
 2. for this reason 1., the "guix time-machine" is broken for all the
Bioconductor packages, at least if Berlin or SWH does not have a
substitute; which is not expected for 'annotation' packages.

However, the Bioconductor archive still serves the old release, i.e.,

https://bioconductor.org/packages/3.x/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz


The ways to fix the both issues are:

 a) Add the Bioconductor release (known at packaging time) to all the
packages; provide as argument to 'bioconductor-uri'.
 b) Add more URLs to fallback.

As discussed on IRC, Tobias seems more inclined with the option a) and
I am more in favour of option b.

Attached, a quick patch showing the option b).


Please also consider #36805 which was never merged or closed.
 http://issues.guix.gnu.org/issue/36805


All the best,
simon

[-- Attachment #2: 0001-build-system-r-Use-Bioconductor-old-releases-to-fall.patch --]
[-- Type: text/x-patch, Size: 2041 bytes --]

From 87e73e02202fe5e342d68f1fb17efdd4425737cd Mon Sep 17 00:00:00 2001
From: zimoun <zimon.toutoune@gmail.com>
Date: Tue, 3 Mar 2020 16:53:39 +0100
Subject: [PATCH] build-system: r: Use Bioconductor old releases to fallback.

* guix/build-system/r.scm (bioconductor-uri): Extend the fallback list.
---
 guix/build-system/r.scm | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/guix/build-system/r.scm b/guix/build-system/r.scm
index 2d328764b0..8638e1b888 100644
--- a/guix/build-system/r.scm
+++ b/guix/build-system/r.scm
@@ -54,15 +54,18 @@ release corresponding to NAME and VERSION."
                          ('annotation "/data/annotation")
                          ('experiment "/data/experiment")
                          (_ "/bioc"))))
-    (list (string-append "https://bioconductor.org/packages/release"
-                         type-url-part
-                         "/src/contrib/"
-                         name "_" version ".tar.gz")
-          ;; TODO: use %bioconductor-version from (guix import cran)
-          (string-append "https://bioconductor.org/packages/3.10"
-                         type-url-part
-                         "/src/contrib/Archive/"
-                         name "_" version ".tar.gz"))))
+    (append (list (string-append "https://bioconductor.org/packages/release"
+                                 type-url-part
+                                 "/src/contrib/"
+                                 name "_" version ".tar.gz"))
+            (map (lambda (release)
+                   (string-append "https://bioconductor.org/packages/"
+                                  release
+                                  type-url-part
+                                  "/src/contrib/"
+                                  name "_" version ".tar.gz"))
+                 (list (@@ (guix import cran) %bioconductor-version)
+                       "3.9" "3.8" "3.7")))))
 
 (define %r-build-system-modules
   ;; Build-side modules imported by default.
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-03-03 15:59 bug#39885: Bioconductor URI, fallback and time-machine zimoun
@ 2020-03-23 21:20 ` Ricardo Wurmus
  2020-05-21 23:29   ` zimoun
  2020-06-24 11:07 ` zimoun
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Ricardo Wurmus @ 2020-03-23 21:20 UTC (permalink / raw)
  To: zimoun; +Cc: 39885


zimoun <zimon.toutoune@gmail.com> writes:

>  1. when Bioconductor updates their release, some package versions are
> updated too, and so, the upstream return 404.
>  2. for this reason 1., the "guix time-machine" is broken for all the
> Bioconductor packages, at least if Berlin or SWH does not have a
> substitute; which is not expected for 'annotation' packages.
>
> However, the Bioconductor archive still serves the old release, i.e.,
>
> https://bioconductor.org/packages/3.x/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz
>
>
> The ways to fix the both issues are:
>
>  a) Add the Bioconductor release (known at packaging time) to all the
> packages; provide as argument to 'bioconductor-uri'.
>  b) Add more URLs to fallback.
>
> As discussed on IRC, Tobias seems more inclined with the option a) and
> I am more in favour of option b.

I think option a) is more explicit, which is probably what we generally
want to future-proof the time-machine.  Fallbacks are okay in the case
of the CRAN URL where it’s not necessarily clear when a package tarball
moves from the release location to the archive.

In the case of Bioconductor URLs it seems that we can afford to be a bit
more accurate.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-03-23 21:20 ` Ricardo Wurmus
@ 2020-05-21 23:29   ` zimoun
  0 siblings, 0 replies; 31+ messages in thread
From: zimoun @ 2020-05-21 23:29 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 39885

Dear Ricardo,

On Mon, 23 Mar 2020 at 22:21, Ricardo Wurmus <rekado@elephly.net> wrote:

> >  a) Add the Bioconductor release (known at packaging time) to all the
> > packages; provide as argument to 'bioconductor-uri'.
> >  b) Add more URLs to fallback.
> >
> > As discussed on IRC, Tobias seems more inclined with the option a) and
> > I am more in favour of option b.
>
> I think option a) is more explicit, which is probably what we generally
> want to future-proof the time-machine.  Fallbacks are okay in the case
> of the CRAN URL where it’s not necessarily clear when a package tarball
> moves from the release location to the archive.
>
> In the case of Bioconductor URLs it seems that we can afford to be a bit
> more accurate.

We are going for option a) which means rename all the URLs, right?

Because it is a lot, I suggest to first address the bug#36805, i.e.,
provide as an argument the BioConductor version to 'bioconductor-uri'
and applies this policy to all the new packages or any update of them.

Moreover, I have suggested to reorganise bioconductor.scm,
bioinformatics.scm, cran.scm, etc. and I have not dedicated enough
time to this boring task.  But because I am working remotely
(semi-lockdown), I plan to work on it next week and so this change of
URLs could be part of the big reorganisation.

What do you think?

[1] http://issues.guix.gnu.org/issue/36805


All the best,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-03-03 15:59 bug#39885: Bioconductor URI, fallback and time-machine zimoun
  2020-03-23 21:20 ` Ricardo Wurmus
@ 2020-06-24 11:07 ` zimoun
  2020-06-28 20:14   ` Ludovic Courtès
  2020-11-19 14:22 ` zimoun
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: zimoun @ 2020-06-24 11:07 UTC (permalink / raw)
  To: 39885, Tobias Geerinckx-Rice, Ricardo Wurmus

Dear,

The time-machine is broken for some BioConductor packages..  For an
example, consider the package "r-genomegraphs" which has been removed
from the BioConductor in 3.11 release.

(Well, now the issue is mitigated because ci.guix.gnu.org serves a lot
of upstream substitutes but ci.guix.gnu.org could be down.  Other
said, we should use the upstream resources where they are available.)


Concretely, there are 2 issues:

 a) What to do for the removed packages?  For 3.11, the list is there
[1].  Do we keep them in gnu/packages/bioconductor.scm but then
'bioconductor-uri' needs some tweaks?  Or do we transfer them to the
channel guix-past (for example)?

 b) The fallback URI in guix/build-system/r.scm(bioconductor-uri)
added by commit c586f427b4831b9b492e5b900b2226e898b8fcfa is not
correct, if I do not misread:

--8<---------------cut here---------------start------------->8---
"https://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/GenomeGraphs_1.46.0.tar.gz"
404 "Not Found"
--8<---------------cut here---------------end--------------->8---

The correct seems to be (without Archive):

https://bioconductor.org/packages/3.10/bioc/src/contrib/GenomeGraphs_1.46.0.tar.gz


All the best,
simon

1: https://bioconductor.org/news/bioc_3_11_release/#deprecated-and-defunct-packages




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-06-24 11:07 ` zimoun
@ 2020-06-28 20:14   ` Ludovic Courtès
  2020-06-29 17:36     ` zimoun
  0 siblings, 1 reply; 31+ messages in thread
From: Ludovic Courtès @ 2020-06-28 20:14 UTC (permalink / raw)
  To: zimoun; +Cc: 39885

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

>  b) The fallback URI in guix/build-system/r.scm(bioconductor-uri)
> added by commit c586f427b4831b9b492e5b900b2226e898b8fcfa is not
> correct, if I do not misread:
>
> "https://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/GenomeGraphs_1.46.0.tar.gz"
> 404 "Not Found"
>
> The correct seems to be (without Archive):
>
> https://bioconductor.org/packages/3.10/bioc/src/contrib/GenomeGraphs_1.46.0.tar.gz

Could you provide a patch for this?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-06-28 20:14   ` Ludovic Courtès
@ 2020-06-29 17:36     ` zimoun
  2020-06-29 20:42       ` Ludovic Courtès
  0 siblings, 1 reply; 31+ messages in thread
From: zimoun @ 2020-06-29 17:36 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39885

[-- Attachment #1: Type: text/plain, Size: 300 bytes --]

Hi Ludo,

On Sun, 28 Jun 2020 at 22:14, Ludovic Courtès <ludo@gnu.org> wrote:

> Could you provide a patch for this?

About the url, for sure, see attached.

But it does not address the root of the problem.  Well, I will try to
find a slot and propose something.


All the best,
simon

[-- Attachment #2: 0001-build-system-r-bioconductor-uri-Fix-archive-URL.patch --]
[-- Type: text/x-patch, Size: 1014 bytes --]

From c1c963a3b86e306a20c14626127e54d21843c22c Mon Sep 17 00:00:00 2001
From: zimoun <zimon.toutoune@gmail.com>
Date: Mon, 29 Jun 2020 19:18:20 +0200
Subject: [PATCH] build-system/r: bioconductor-uri: Fix archive URL.

* guix/build-system/r.scm (bioconductor-uri): Fix archive URL.
---
 guix/build-system/r.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guix/build-system/r.scm b/guix/build-system/r.scm
index c8ec9abd0d..5ef982d66a 100644
--- a/guix/build-system/r.scm
+++ b/guix/build-system/r.scm
@@ -61,7 +61,7 @@ release corresponding to NAME and VERSION."
           ;; TODO: use %bioconductor-version from (guix import cran)
           (string-append "https://bioconductor.org/packages/3.11"
                          type-url-part
-                         "/src/contrib/Archive/"
+                         "/src/contrib/"
                          name "_" version ".tar.gz"))))
 
 (define %r-build-system-modules

base-commit: 6ebf300959a58fd1eda875205c75d21137862285
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-06-29 17:36     ` zimoun
@ 2020-06-29 20:42       ` Ludovic Courtès
  0 siblings, 0 replies; 31+ messages in thread
From: Ludovic Courtès @ 2020-06-29 20:42 UTC (permalink / raw)
  To: zimoun; +Cc: 39885

zimoun <zimon.toutoune@gmail.com> skribis:

> From c1c963a3b86e306a20c14626127e54d21843c22c Mon Sep 17 00:00:00 2001
> From: zimoun <zimon.toutoune@gmail.com>
> Date: Mon, 29 Jun 2020 19:18:20 +0200
> Subject: [PATCH] build-system/r: bioconductor-uri: Fix archive URL.
>
> * guix/build-system/r.scm (bioconductor-uri): Fix archive URL.

Applied, thanks!

I let the rest of you discuss the other issues.  :-)

Ludo’.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-03-03 15:59 bug#39885: Bioconductor URI, fallback and time-machine zimoun
  2020-03-23 21:20 ` Ricardo Wurmus
  2020-06-24 11:07 ` zimoun
@ 2020-11-19 14:22 ` zimoun
  2021-11-22 19:48 ` zimoun
  2022-07-18 16:03 ` zimoun
  4 siblings, 0 replies; 31+ messages in thread
From: zimoun @ 2020-11-19 14:22 UTC (permalink / raw)
  To: 39885

Hi,

Some explanations of the issue are provided here:

    <http://issues.guix.gnu.org/issue/39885>

Since we are currently updating to 3.12, maybe it is the occasion to fix
the issue.  See option a) below.


On Tue, 03 Mar 2020 at 16:59, zimoun <zimon.toutoune@gmail.com> wrote:

> Currently, the URI scheme (see 'bioconductor-uri' in
> guix/build-system/r.scm) is:
>
>  https://bioconductor.org/packages/release/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz
>
> which leads to 2 issues:
>
>  1. when Bioconductor updates their release, some package versions are
> updated too, and so, the upstream return 404.
>
>  2. for this reason 1., the "guix time-machine" is broken for all the
> Bioconductor packages, at least if Berlin or SWH does not have a
> substitute; which is not expected for 'annotation' packages.

An example of this issue is for example:

--8<---------------cut here---------------start------------->8---
$ guix time-machine --commit=aee183e -- import cran -a bioconductor CATALYST -r

Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...

Starting download of /tmp/guix-file.Nxajqh
From https://bioconductor.org/packages/release/bioc/src/contrib/CATALYST_1.12.2.tar.gz...
download failed "https://bioconductor.org/packages/release/bioc/src/contrib/CATALYST_1.12.2.tar.gz" 404 "Not Found"
failed to download "/tmp/guix-file.Nxajqh" from "https://bioconductor.org/packages/release/bioc/src/contrib/CATALYST_1.12.2.tar.gz"
error: failed to retrieve package information from "https://cran.r-project.org/web/packages/CATALYST/DESCRIPTION": 404 ("Not Found")
Backtrace:
           4 (primitive-load "/home/simon/.cache/guix/inferiors/vznc…")
In guix/ui.scm:
  2117:12  3 (run-guix-command _ . _)
In guix/scripts/import.scm:
   120:11  2 (guix-import . _)
In srfi/srfi-1.scm:
   586:17  1 (map1 (#f))
In guix/import/utils.scm:
    258:2  0 (package->definition _)

guix/import/utils.scm:258:2: In procedure package->definition:
Throw to key `match-error' with args `("match" "no matching pattern" #f)'.
--8<---------------cut here---------------end--------------->8---

Aside the ugly backtrace which is tracked by #44115, the main issue is
because Bioconductor updated to 3.12 and Guix is still at 3.11.

Concretely, the issue is that ’release’ in the URL:

<https://bioconductor.org/packages/release/bioc/src/contrib/CATALYST_1.12.2.tar.gz>

now refers to 3.12 (because Bioconductor update) and Guix still think it
is 3.11 (because Guix has not yet updated; work-in-progress).  And
CATALYST in 3.12 is at version 1.14.0 against 1.12.2 for 3.11.
Therefore, the conflict and the error.

It means that while:

    (define %bioconductor-version "3.11")

is not updated to 3.12, all the Bioconductor packages are broken; in the
meaning not buildable from source.


>  a) Add the Bioconductor release (known at packaging time) to all the
> packages; provide as argument to 'bioconductor-uri'.
>  b) Add more URLs to fallback, e.g.:
>
> https://bioconductor.org/packages/release/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz
> https://bioconductor.org/packages/3.11/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz
>
> Attached, a quick patch showing the option b).

Then each time we update Bioconductor, we add an URL to the list.


> As discussed on IRC, Tobias seems more inclined with the option a) and
> I am more in favour of option b.

Tobias and Ricardo are in favor for a) (see this thread).  Which means a
lot of work IMHO, i.e., add 3.11 as arguments and then 3.12 to all the
Bioconductor packages and fix the importer, IIUC; while b) means do
nothing except merge the proposed patch (possibly re-worked).

Just to note that only the task to group in bioconductor.scm all the
Bioconductor packages scattered here and there is still not done, I
think option a) is not doable by hand – I do not volunteer! :-) Else,
any suggestion to script the task instead?

Since I am more in favor of b), I am less motivated to fix the a). ;-)
But I am motivated to fix the issue at hand. :-)


Other option c) is to switch all the Bioconductor to git-fetch instead
of url-fetch.  I have not checked yet how could be the transition.


> Please also consider #36805 which was never merged or closed.
>  http://issues.guix.gnu.org/issue/36805

This patch could help for option a).


WDYT?

All the best,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-03-03 15:59 bug#39885: Bioconductor URI, fallback and time-machine zimoun
                   ` (2 preceding siblings ...)
  2020-11-19 14:22 ` zimoun
@ 2021-11-22 19:48 ` zimoun
  2022-07-18 16:03 ` zimoun
  4 siblings, 0 replies; 31+ messages in thread
From: zimoun @ 2021-11-22 19:48 UTC (permalink / raw)
  To: 39885; +Cc: rekado

Hi,

On Tue, 03 Mar 2020 at 16:59, zimoun <zimon.toutoune@gmail.com> wrote:

> Currently, the URI scheme (see 'bioconductor-uri' in
> guix/build-system/r.scm) is:
>
>  https://bioconductor.org/packages/release/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz
>
> which leads to 2 issues:
>
>  1. when Bioconductor updates their release, some package versions are
> updated too, and so, the upstream return 404.
>  2. for this reason 1., the "guix time-machine" is broken for all the
> Bioconductor packages, at least if Berlin or SWH does not have a
> substitute; which is not expected for 'annotation' packages.
>
> However, the Bioconductor archive still serves the old release, i.e.,
>
> https://bioconductor.org/packages/3.x/data/<type-url-part>/src/contrib/<upstream-name>-<version>.tar.gz

It is still the case and for concrete breakage, see [1].  I will not
detail but each time Guix lags behind Bioconductor new release, it is
broken.  For sure, Guix upgrades more or less quickly.  Each time
Bioconductor remove a package, it is broken.  Well, because a lot
of care about R packages, the forward breakages happen barely. :-)  But
backward breakages are not negligible, IMHO.


Well, this URL choice is not The Right Thing and somehow broken by design.

1: <https://issues.guix.gnu.org/39885#7>


> The ways to fix the both issues are:
>
>  a) Add the Bioconductor release (known at packaging time) to all the
> packages; provide as argument to 'bioconductor-uri'.
>  b) Add more URLs to fallback.
>
> As discussed on IRC, Tobias seems more inclined with the option a) and
> I am more in favour of option b.
>
> Attached, a quick patch showing the option b).

We are now 1.5 years after.  And we did nothing; well we did other
things instead. ;-).  Now, I have an strong opinion that option a) is
not doable: I speak using my janitor moves of Bioconductor packages.

Instead, something along the proposed patch below half-fixes the issue
now.  We just have to append the releases and let the fallback mechanism
takes care.  It reduces the maintenance burden, IMHO.

For sure, it is not perfect but it appears to me a pragmatical fix
waiting something better.


This better is unknown (at least from me :-)).  On one hand Disarchive
would improve the situation for tarballs… but some work remains (check
that SWH ingestion and rebuild is bullet-proof).  On the other hand,
Bioconductor uses Git, for instance:

    git clone https://git.bioconductor.org/packages/CATALYST

<https://bioconductor.org/packages/release/bioc/html/CATALYST.html>

And Bioconductor uses ’origin/RELEASE_3.14’ as Git tag.  Based on this,
it would avoid the eternal inplace-change fixes.

For instance, the package tximeta [2], recently updated by Ricardo.
Well, from their Bioconductor Git repo,

    git clone https://git.bioconductor.org/packages/tximeta

it is not clear that the current version is at 1.12.3.  And it is not
clear either if they tagged origin/RELEASE_3_14 at 1.12.0 and did
something ugly to then get 1.12.3.  Anyway, switch from url-fetch to
git-fetch is an option.  However, it is as option a) and I am not
convinced it is doable with the resource at hand.

2: <https://bioconductor.org/packages/3.14/bioc/html/tximeta.html>


What could a plan to have a bullet-proof “guix time-machine” for
Bioconductor?


Cheers,
simon


> From 87e73e02202fe5e342d68f1fb17efdd4425737cd Mon Sep 17 00:00:00 2001
> From: zimoun <zimon.toutoune@gmail.com>
> Date: Tue, 3 Mar 2020 16:53:39 +0100
> Subject: [PATCH] build-system: r: Use Bioconductor old releases to fallback.
>
> * guix/build-system/r.scm (bioconductor-uri): Extend the fallback list.
> ---
>  guix/build-system/r.scm | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/guix/build-system/r.scm b/guix/build-system/r.scm
> index 2d328764b0..8638e1b888 100644
> --- a/guix/build-system/r.scm
> +++ b/guix/build-system/r.scm
> @@ -54,15 +54,18 @@ release corresponding to NAME and VERSION."
>                           ('annotation "/data/annotation")
>                           ('experiment "/data/experiment")
>                           (_ "/bioc"))))
> -    (list (string-append "https://bioconductor.org/packages/release"
> -                         type-url-part
> -                         "/src/contrib/"
> -                         name "_" version ".tar.gz")
> -          ;; TODO: use %bioconductor-version from (guix import cran)
> -          (string-append "https://bioconductor.org/packages/3.10"
> -                         type-url-part
> -                         "/src/contrib/Archive/"
> -                         name "_" version ".tar.gz"))))
> +    (append (list (string-append "https://bioconductor.org/packages/release"
> +                                 type-url-part
> +                                 "/src/contrib/"
> +                                 name "_" version ".tar.gz"))
> +            (map (lambda (release)
> +                   (string-append "https://bioconductor.org/packages/"
> +                                  release
> +                                  type-url-part
> +                                  "/src/contrib/"
> +                                  name "_" version ".tar.gz"))
> +                 (list (@@ (guix import cran) %bioconductor-version)
> +                       "3.9" "3.8" "3.7")))))
>
>  (define %r-build-system-modules
>    ;; Build-side modules imported by default.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2020-03-03 15:59 bug#39885: Bioconductor URI, fallback and time-machine zimoun
                   ` (3 preceding siblings ...)
  2021-11-22 19:48 ` zimoun
@ 2022-07-18 16:03 ` zimoun
  2022-07-18 16:21   ` Ricardo Wurmus
                     ` (2 more replies)
  4 siblings, 3 replies; 31+ messages in thread
From: zimoun @ 2022-07-18 16:03 UTC (permalink / raw)
  To: 39885; +Cc: rekado, Timothy Sample, me

Hi,

Since 2020, I provided several examples of breakage with bug#39885 [1].
Here another one:

--8<---------------cut here---------------start------------->8---
$ guix time-machine --commit=77e2de365497bf4c8b81cbd78624f78293490485 \
       -- build r-biocneighbors -S
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
The following derivation will be built:
   /gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv
building /gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv...

Starting download of /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz
From https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz...
download failed "https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz" 404 "Not Found"

Starting download of /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz
From https://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/BiocNeighbors_1.4.1.tar.gz...
download failed "https://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/BiocNeighbors_1.4.1.tar.gz" 404 "Not Found"

Starting download of /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz
From https://ci.guix.gnu.org/file/BiocNeighbors_1.4.1.tar.gz/sha256/05vi1cij37s8wgj92k3l6a3f3dwldj8jvijdp4695zczka6kypdf...
download failed "https://ci.guix.gnu.org/file/BiocNeighbors_1.4.1.tar.gz/sha256/05vi1cij37s8wgj92k3l6a3f3dwldj8jvijdp4695zczka6kypdf" 404 "Not Found"

Starting download of /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz
From https://tarballs.nixos.org/sha256/05vi1cij37s8wgj92k3l6a3f3dwldj8jvijdp4695zczka6kypdf...
download failed "https://tarballs.nixos.org/sha256/05vi1cij37s8wgj92k3l6a3f3dwldj8jvijdp4695zczka6kypdf" 404 "Not Found"

Starting download of /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz
From https://archive.softwareheritage.org/api/1/content/sha256:ae5d3f8d9a9ffd920cb94dc62d916c94b7e18632744c91e4e3489f21230b7117/raw/...
download failed "https://archive.softwareheritage.org/api/1/content/sha256:ae5d3f8d9a9ffd920cb94dc62d916c94b7e18632744c91e4e3489f21230b7117/raw/" 404 "Not Found"

Starting download of /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz
From https://web.archive.org/web/20220718175152/https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz...
download failed "https://web.archive.org/web/20220718175152/https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz" 404 "NOT FOUND"
Trying to use Disarchive to assemble /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz...
could not find its Disarchive specification
failed to download "/gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz" from ("https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz" "https://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/BiocNeighbors_1.4.1.tar.gz")
builder for `/gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv' failed to produce output path `/gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz'
build of /gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv failed
View build log at '/var/log/guix/drvs/q9/ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv.gz'.
guix build: error: build of `/gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv' failed
--8<---------------cut here---------------end--------------->8---

Well, several comments:

 1. Berlin or Bordeaux do not have it as substitutes,
 2. Diasarchive does not have it,
 3. Many others neither.

but the question in the first place is: why is Bioconductor failing?
Because they do ugly things!

Our history reads:

f431d5e299 Sun Dec 15 15:38:51 2019 +0100 guix: Upgrade to Bioconductor 3.10
12e2aa96dc Sun Dec 15 15:38:55 2019 +0100 gnu: r-biocneighbors: Update to 1.4.1.
aece78fe2f Sun Mar 1 23:38:12 2020 +0100 gnu: r-biocneighbors: Update to 1.4.2.
8e518d4802 Sat Jun 13 01:19:38 2020 +0200 guix: Update to Bioconductor 3.11.

which means that Bioconductor removes v1.4.1 from their URI scheme
(even, I do not know if the tarball is still available on their infra)
and despite the fact Bioconductor v3.10 had released v1.4.1, then it is
not stable.

At the cost of more bandwidth, we could switch from url-fetch to
git-fetch.  Or we also could examine why Disarchive is failing here.


1: <http://issues.guix.gnu.org/issue/39885>

Cheers,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2022-07-18 16:03 ` zimoun
@ 2022-07-18 16:21   ` Ricardo Wurmus
  2022-08-10 18:25     ` Ricardo Wurmus
  2023-12-22 13:40   ` bug#39885: Bioconductor tarballs are not archived Ludovic Courtès
  2023-12-22 20:57   ` bug#39885: Bioconductor URI, fallback and time-machine Ludovic Courtès
  2 siblings, 1 reply; 31+ messages in thread
From: Ricardo Wurmus @ 2022-07-18 16:21 UTC (permalink / raw)
  To: zimoun; +Cc: Timothy Sample, 39885, me


zimoun <zimon.toutoune@gmail.com> writes:

> At the cost of more bandwidth, we could switch from url-fetch to
> git-fetch.

Let’s do it!  I’m tired of Bioconductor archive shenanigans messing with
package availability.

-- 
Ricardo




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2022-07-18 16:21   ` Ricardo Wurmus
@ 2022-08-10 18:25     ` Ricardo Wurmus
  2022-08-10 19:44       ` Maxime Devos
                         ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Ricardo Wurmus @ 2022-08-10 18:25 UTC (permalink / raw)
  To: zimoun; +Cc: Timothy Sample, 39885, me


Ricardo Wurmus <rekado@elephly.net> writes:

> zimoun <zimon.toutoune@gmail.com> writes:
>
>> At the cost of more bandwidth, we could switch from url-fetch to
>> git-fetch.
>
> Let’s do it!  I’m tired of Bioconductor archive shenanigans messing with
> package availability.

I have finally taken the time to review this and implement a first draft
of a change to the bioconductor importer and updater.

There are some limitations:

- we cannot use the updater to go from “url-fetch” to “git-fetch”.
  That’s because “package-update” in (guix upstream) decides whether to
  use package-update/url-fetch or package-update/git-fetch based on the
  *current* package value’s origin fetch procedure.  For the switch we
  can hack around this (adding an exception for bioconductor packages),
  but there is no pretty way to do this in a generic fashion that could
  be committed.

  Perhaps we could operate on the url included in the <upstream-source>
  instead of looking at the *current* package value.  We’re only
  accessing “package” once in the url-fetch case, so maybe we can work
  around this problem.

- the repositories at https://git.bioconductor.org/package/NAME do not
  tag package versions.  The only method of organization is branches
  that are named after *Bioconductor releases* (not package releases),
  e.g. RELEASE_3_15.  We can only determine the package version by
  reading its DESCRIPTION file or by looking up the version index for
  all Bioconductor packages (we do that already).  This means that there
  could be different commits for the same package version in the same
  release branch — so we have to include the commit hash and a revision
  counter in the version string.

- the updater doesn’t work on version expressions like (git-version
  "1.12" revision commit).  It expects to be able to replace literal
  strings.  Because of that my changes let the importer generate a
  string literal such as "1.12-0.cafebab" without a let-bound commit
  string.

- “experiment” or “data” packages are not kept in Git.  They only exist
  as volatile tarballs that will be overwritten.  Thankfully, they don’t
  change all that often, so they have a good chance of making it into
  our archives.

- the above exception means that we need to litter the importer and
  updater code with extra checks.

With all these notes out of the way I’ll prepare a series of patches
next.

-- 
Ricardo




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2022-08-10 18:25     ` Ricardo Wurmus
@ 2022-08-10 19:44       ` Maxime Devos
  2022-08-10 19:48         ` Maxime Devos
  2022-09-09 17:23       ` zimoun
  2024-01-08 15:07       ` Ludovic Courtès
  2 siblings, 1 reply; 31+ messages in thread
From: Maxime Devos @ 2022-08-10 19:44 UTC (permalink / raw)
  To: Ricardo Wurmus, zimoun; +Cc: Timothy Sample, 39885, me


[-- Attachment #1.1.1: Type: text/plain, Size: 489 bytes --]


On 10-08-2022 20:25, Ricardo Wurmus wrote:
> - the updater doesn’t work on version expressions like (git-version
>    "1.12" revision commit).  It expects to be able to replace literal
>    strings.  Because of that my changes let the importer generate a
>    string literal such as "1.12-0.cafebab" without a let-bound commit
>    string.
I've a patch that implements replacing (revision "N") by (revision 
"N+1"), apparently it's not applied yet but let me search for it ...

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2022-08-10 19:44       ` Maxime Devos
@ 2022-08-10 19:48         ` Maxime Devos
  0 siblings, 0 replies; 31+ messages in thread
From: Maxime Devos @ 2022-08-10 19:48 UTC (permalink / raw)
  To: Ricardo Wurmus, zimoun; +Cc: Timothy Sample, 39885, me


[-- Attachment #1.1.1: Type: text/plain, Size: 818 bytes --]


On 10-08-2022 21:44, Maxime Devos wrote:
>
> On 10-08-2022 20:25, Ricardo Wurmus wrote:
>> - the updater doesn’t work on version expressions like (git-version
>>    "1.12" revision commit).  It expects to be able to replace literal
>>    strings.  Because of that my changes let the importer generate a
>>    string literal such as "1.12-0.cafebab" without a let-bound commit
>>    string.
> I've a patch that implements replacing (revision "N") by (revision 
> "N+1"), apparently it's not applied yet but let me search for it ...

Found it:

<https://issues.guix.gnu.org/53144#13>

That patch series was written with Minetest / ContentDB and a new 
'latest-git' updater in mind, but the ContentDB and latest-git bits 
should be separable without much trouble.

Greetings,
Maxime.


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2022-08-10 18:25     ` Ricardo Wurmus
  2022-08-10 19:44       ` Maxime Devos
@ 2022-09-09 17:23       ` zimoun
  2024-01-08 15:07       ` Ludovic Courtès
  2 siblings, 0 replies; 31+ messages in thread
From: zimoun @ 2022-09-09 17:23 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Timothy Sample, 39885, me

Hi Ricardo,

I am late.  This message landed when I was traveling for holidays. :-)

On Wed, 10 Aug 2022 at 20:25, Ricardo Wurmus <rekado@elephly.net> wrote:

> - we cannot use the updater to go from “url-fetch” to “git-fetch”.
>   That’s because “package-update” in (guix upstream) decides whether to
>   use package-update/url-fetch or package-update/git-fetch based on the
>   *current* package value’s origin fetch procedure.  For the switch we
>   can hack around this (adding an exception for bioconductor packages),
>   but there is no pretty way to do this in a generic fashion that could
>   be committed.

It appears to me acceptable to have an exception.  Or even to do it just
once as a big replacement of Bioconductor packages.

> - the repositories at https://git.bioconductor.org/package/NAME do not
>   tag package versions.  The only method of organization is branches
>   that are named after *Bioconductor releases* (not package releases),
>   e.g. RELEASE_3_15.  We can only determine the package version by
>   reading its DESCRIPTION file or by looking up the version index for
>   all Bioconductor packages (we do that already).  This means that there
>   could be different commits for the same package version in the same
>   release branch — so we have to include the commit hash and a revision
>   counter in the version string.

This is the most annoying part.  Indeed, when I check out some
Bioconductor Git repositories, I am always confused by their Git
structure.

From my understanding, the tarball you fetch from bioconductor.org has
the same content than the commit tagged “Bioconductor release”
(RELEASE_X_Y).  The content of the upstream release can mismatch the
content of the Bioconductor tarball release.

I do not know how it would be complicated or inaccurate to consider the
package version from the Bioconductor index and assign this version to
the commit tagged RELEASE_X_Y.  This commit would appear in the Guix
package definition though.  Or maybe we transparently could RELEASE_X_Y
to determine this commit.


> - the updater doesn’t work on version expressions like (git-version
>   "1.12" revision commit).  It expects to be able to replace literal
>   strings.  Because of that my changes let the importer generate a
>   string literal such as "1.12-0.cafebab" without a let-bound commit
>   string.

Maxime pointed patch#53144 [1] but I have not looked at it yet.


1: <https://issues.guix.gnu.org/53144#13>


> - “experiment” or “data” packages are not kept in Git.  They only exist
>   as volatile tarballs that will be overwritten.  Thankfully, they don’t
>   change all that often, so they have a good chance of making it into
>   our archives.

That’s an interesting question for Disarchive and Software Heritage.


Cheers,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2022-07-18 16:03 ` zimoun
  2022-07-18 16:21   ` Ricardo Wurmus
@ 2023-12-22 13:40   ` Ludovic Courtès
  2024-01-08  9:09     ` Simon Tournier
  2024-01-19 15:46     ` Timothy Sample
  2023-12-22 20:57   ` bug#39885: Bioconductor URI, fallback and time-machine Ludovic Courtès
  2 siblings, 2 replies; 31+ messages in thread
From: Ludovic Courtès @ 2023-12-22 13:40 UTC (permalink / raw)
  To: zimoun; +Cc: rekado, Timothy Sample, 39885, me

Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

> Since 2020, I provided several examples of breakage with bug#39885 [1].
> Here another one:
>
> $ guix time-machine --commit=77e2de365497bf4c8b81cbd78624f78293490485 \
>        -- build r-biocneighbors -S

[...]

> Starting download of /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz
>>From https://web.archive.org/web/20220718175152/https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz...
> download failed "https://web.archive.org/web/20220718175152/https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz" 404 "NOT FOUND"
> Trying to use Disarchive to assemble /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz...
> could not find its Disarchive specification
> failed to download "/gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz" from ("https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz" "https://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/BiocNeighbors_1.4.1.tar.gz")
> builder for `/gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv' failed to produce output path `/gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz'
> build of /gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv failed
> View build log at '/var/log/guix/drvs/q9/ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv.gz'.
> guix build: error: build of `/gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv' failed
>
> Well, several comments:
>
>  1. Berlin or Bordeaux do not have it as substitutes,
>  2. Diasarchive does not have it,
>  3. Many others neither.

I was wondering whether we’re now doing better for Bioconductor
tarballs.  The answer, based on small sample, seems to be “not quite”:

--8<---------------cut here---------------start------------->8---
$ guix lint -c archival $(guix package -A ^r-bioc | cut -f1)
gnu/packages/bioconductor.scm:19708:12: r-biocbaseutils@1.4.0: Disarchive entry refers to non-existent SWH directory '726af85395d163b5a21e52e4df1bf18aa0072f6b'
gnu/packages/bioconductor.scm:19752:12: r-bioccheck@1.38.0: Disarchive entry refers to non-existent SWH directory '12cfedcbc27005a3fb7e01c5c4b727e0116f596f'
gnu/packages/bioconductor.scm:16892:5: r-biocfilecache@2.10.1: Disarchive entry refers to non-existent SWH directory '6a2d6d909a7cedd56e96f5a98770deeaaaa8d220'
gnu/packages/bioconductor.scm:4540:12: r-biocgenerics@0.48.1: Disarchive entry refers to non-existent SWH directory '6f19ea14f46dbc75909b77bc08e9023daae6fb9e'
gnu/packages/bioconductor.scm:19785:5: r-biocgraph@1.64.0: Disarchive entry refers to non-existent SWH directory '977ff052b4e6c948af7af0fc14ae61f71427cb1a'
gnu/packages/bioconductor.scm:21524:6: r-biocio@1.12.0: Disarchive entry refers to non-existent SWH directory '29d8fef9a5b386384f20513c612f1e34f6118532'
gnu/packages/bioconductor.scm:13090:5: r-biocneighbors@1.20.0: Disarchive entry refers to non-existent SWH directory '6d3728b2dee78cceecdeba0318f3e57b6013d96f'
gnu/packages/bioconductor.scm:19957:5: r-bioconcotk@1.22.0: Disarchive entry refers to non-existent SWH directory '251081d4bc3f061ef8e16338eb042ad4c71ed02d'
gnu/packages/bioconductor.scm:20003:5: r-biocor@1.26.0: Disarchive entry refers to non-existent SWH directory '0cc9d3dcde06fb353cdd77f3b538845d16a77720'
gnu/packages/bioconductor.scm:6613:12: r-biocparallel@1.36.0: Disarchive entry refers to non-existent SWH directory '41e09414898f61655bcc99fdd44d69b0531c0b2d'
gnu/packages/bioconductor.scm:20030:5: r-biocpkgtools@1.20.0: Disarchive entry refers to non-existent SWH directory '55de8618648ed16797a8effd5b508c652a5d7cbe'
gnu/packages/bioconductor.scm:20144:5: r-biocset@1.16.0: Disarchive entry refers to non-existent SWH directory '1cfa6cac0cb453f2882a35c8f5ae6ddfa713ad2d'
gnu/packages/bioconductor.scm:13276:5: r-biocsingular@1.18.0: Disarchive entry refers to non-existent SWH directory '992d3f9d48633fa5d46b9a7640a825054e9538aa'
gnu/packages/bioconductor.scm:19806:12: r-biocstyle@2.30.0: Disarchive entry refers to non-existent SWH directory 'bb17c3bd9ac7c373b24782fcfecdde5fa2f0a965'
gnu/packages/bioconductor.scm:22965:5: r-biocthis@1.12.0: Disarchive entry refers to non-existent SWH directory '3d08f77aae1e81ce9ca9bb9ae2adf4d4c7421d11'
gnu/packages/bioconductor.scm:4521:5: r-biocversion@3.18.1: source not archived on Software Heritage and missing from the Disarchive database
gnu/packages/bioconductor.scm:19830:12: r-biocviews@1.70.0: Disarchive entry refers to non-existent SWH directory '47e0877ab988469fc09a37505dd769f9626cac2e'
gnu/packages/bioconductor.scm:20182:5: r-biocworkflowtools@1.28.0: Disarchive entry refers to non-existent SWH directory '393f3472cc27f632caea3488aef93a7675b403ef'
$ guix describe
Generation 285  Dec 17 2023 23:31:56    (current)
  guix 6ab2426
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 6ab242609daec00e8bd54f7bff54557c92695724
--8<---------------cut here---------------end--------------->8---

In all cases but one, we’re doing the right thing Disarchive-wise, but
our SWH did not archive them.

<https://guix.gnu.org/sources.json> has entries like:

--8<---------------cut here---------------start------------->8---
    {
      "type": "url",
      "urls": [
        "https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz",
        "https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz",
        "https://bordeaux.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas",
        "https://ci.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas",
        "https://tarballs.nixos.org/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas"
      ],
      "integrity": "sha256-WlUXSI41dP7xSTx81ibFsrIsACW5jmzaX5I/lxJ4vCg=",
      "outputHashAlgo": "sha256",
      "outputHashMode": "flat"
    },
--8<---------------cut here---------------end--------------->8---

Note that we have at least one copy on our infra:

--8<---------------cut here---------------start------------->8---
$ wget -qO- "https://bordeaux.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas"|guix hash  - -f base64
WlUXSI41dP7xSTx81ibFsrIsACW5jmzaX5I/lxJ4vCg=
--8<---------------cut here---------------end--------------->8---

<https://ci.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas>
is 404 (but I can see why: for /file, ‘guix publish’ relies on things
being available in the store and we no longer keep them on ci.guix; we
do have a substitute at
<https://ci.guix.gnu.org/nar/6kfpflffl7b4hx6ibb5k879ar8ffcxb7-BiocNeighbors_1.20.0.tar.gz>
though; we should fix this).

What about hypothesis (2)?  This is what we have:

--8<---------------cut here---------------start------------->8---
$ wget -qO- https://disarchive.guix.gnu.org/sha256//5a5517488e3574fef1493c7cd626c5b2b22c0025b98e6cda5f923f971278bc28 |grep swh
                        (swhid "swh:1:dir:6d3728b2dee78cceecdeba0318f3e57b6013d96f"))
--8<---------------cut here---------------end--------------->8---

I checked with folks on #swh-devel and it turns out that “the legacy
nixguix lister that is still used in production did not detect the
fallback URL as a tarball URL” (the bordeaux.guix.gnu.org URL), but this
is fixed in the new lister, which should be in production “soon”.

As for past tarballs, #swh-devel comrades say we could send them a list
of URLs and they’d create “Save Code Now” requests on our behalf (we
cannot do it ourselves since the site doesn’t accept plain tarballs.)

Any volunteer to write a script that’d generate a list of Bioconductor
content-addressed URLs (the bordeaux.guix.gnu.org/file ones) for say the
past couple of years?

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2022-07-18 16:03 ` zimoun
  2022-07-18 16:21   ` Ricardo Wurmus
  2023-12-22 13:40   ` bug#39885: Bioconductor tarballs are not archived Ludovic Courtès
@ 2023-12-22 20:57   ` Ludovic Courtès
  2024-01-02  9:20     ` Simon Tournier
  2 siblings, 1 reply; 31+ messages in thread
From: Ludovic Courtès @ 2023-12-22 20:57 UTC (permalink / raw)
  To: zimoun; +Cc: rekado, Timothy Sample, 39885, me

zimoun <zimon.toutoune@gmail.com> skribis:

> $ guix time-machine --commit=77e2de365497bf4c8b81cbd78624f78293490485 \
>        -- build r-biocneighbors -S

[...]

> Trying to use Disarchive to assemble /gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz...
> could not find its Disarchive specification
> failed to download "/gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz" from ("https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.4.1.tar.gz" "https://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/BiocNeighbors_1.4.1.tar.gz")
> builder for `/gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv' failed to produce output path `/gnu/store/zgf7x09kgiqbvj0dmhplxi1xzpljxd7k-BiocNeighbors_1.4.1.tar.gz'

In hindsight this is not surprising: this is a Dec. 2019 commit and I
set up <https://ci.guix.gnu.org/jobset/disarchive>,
disarchive.guix.gnu.org, and related machinery in Sep/Oct 2021.

Of course Timothy set up <https://disarchive.ngyro.com> earlier, but not
so much—Timothy started work on Disarchive ca. July 2020:
<https://issues.guix.gnu.org/42162#15>.

(Not that it helps but at least it’s a relief to know that this
particular problem predates our more serious efforts.)

Ludo’.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2023-12-22 20:57   ` bug#39885: Bioconductor URI, fallback and time-machine Ludovic Courtès
@ 2024-01-02  9:20     ` Simon Tournier
  0 siblings, 0 replies; 31+ messages in thread
From: Simon Tournier @ 2024-01-02  9:20 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: rekado, Timothy Sample, 39885, me

Hi Ludo,

On Fri, 22 Dec 2023 at 21:57, Ludovic Courtès <ludo@gnu.org> wrote:

> In hindsight this is not surprising: this is a Dec. 2019 commit and I
> set up <https://ci.guix.gnu.org/jobset/disarchive>,
> disarchive.guix.gnu.org, and related machinery in Sep/Oct 2021.
>
> Of course Timothy set up <https://disarchive.ngyro.com> earlier, but not
> so much—Timothy started work on Disarchive ca. July 2020:
> <https://issues.guix.gnu.org/42162#15>.
>
> (Not that it helps but at least it’s a relief to know that this
> particular problem predates our more serious efforts.)

Yeah!  On the other hand, I wish that Guix will be able to build all –
or at least most of – the packages that time-machine is able to reach –
say Guix v1.0. :-)  Let be ambitious. ;-)

Cheers,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2023-12-22 13:40   ` bug#39885: Bioconductor tarballs are not archived Ludovic Courtès
@ 2024-01-08  9:09     ` Simon Tournier
  2024-01-08 15:02       ` Ludovic Courtès
  2024-01-19 15:46     ` Timothy Sample
  1 sibling, 1 reply; 31+ messages in thread
From: Simon Tournier @ 2024-01-08  9:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: rekado, Timothy Sample, 39885, me

Hi,

On Fri, 22 Dec 2023 at 14:40, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

>> guix build: error: build of `/gnu/store/q9ggmh5a9bzmnr49p10x1w9sv6pzjarv-BiocNeighbors_1.4.1.tar.gz.drv' failed

First thing first, please note that we are speaking about tag 1.4.1 and
not 1.20.0.  And this 1.4.1 is gone from “our” infra since… ?? That’s
one of the things I do not like with Guix: I never know what to expect
from the infra.  Anyway, I have my list of TODOs for improving the
annoyances (I and maybe others have :-)); stay tuned. ;-)

Considering the state of “our” infra and how Bioconductor manages the
tarballs, many tarballs are lost forever, sadly.  Although the content
is still around, I guess.


> I was wondering whether we’re now doing better for Bioconductor
> tarballs.  The answer, based on small sample, seems to be “not quite”:

Thanks for diving into this.


> <https://guix.gnu.org/sources.json> has entries like:
>
> --8<---------------cut here---------------start------------->8---
>     {
>       "type": "url",
>       "urls": [
>         "https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz",
>         "https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz",
>         "https://bordeaux.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas",
>         "https://ci.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas",
>         "https://tarballs.nixos.org/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas"
>       ],
>       "integrity": "sha256-WlUXSI41dP7xSTx81ibFsrIsACW5jmzaX5I/lxJ4vCg=",
>       "outputHashAlgo": "sha256",
>       "outputHashMode": "flat"
>     },
> --8<---------------cut here---------------end--------------->8---

Please note that Bioconductor 3.18 released BiocNeighbors v1.20.0 but
then updated to v1.20.1 still under Bioconductor 3.18 and Ricardo did
this update with 5673484cbc2ed74c61ae81d623646fa7829fbc32.  On a side
note, between the Bioconductor update and the update on our side, there
is a mismatch where the source of r-biocneighbors is unreachable.

Other said, post-update on our side,

--8<---------------cut here---------------start------------->8---
$ zcat sources.json | jq | grep BiocNeighbors | grep bioconductor | sed 's/"//g' | sed 's/,//g'
        https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.1.tar.gz
        https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.1.tar.gz

$ for url in $(zcat sources.json | jq | grep BiocNeighbors | grep bioconductor | sed 's/"//g' | sed 's/,//g'); \
     do guix download $url ;done

Starting download of /tmp/guix-file.STc9fQ
From https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.1.tar.gz...
 …_1.20.1.tar.gz  1015KiB                                                                                         60.3MiB/s 00:00 ▕██████████████████▏ 100.0%
/gnu/store/nxab1pskh9zcjspczph6jcs5fk79pb7k-BiocNeighbors_1.20.1.tar.gz
0w7hd6w0lmj1jaaq9zd5gwnnpkzcr0byqm5q584wjg4xgvsb981j

Starting download of /tmp/guix-file.aZFRLv
From https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.1.tar.gz...
 …_1.20.1.tar.gz  1015KiB                                                                                         63.5MiB/s 00:00 ▕██████████████████▏ 100.0%
/gnu/store/nxab1pskh9zcjspczph6jcs5fk79pb7k-BiocNeighbors_1.20.1.tar.gz
0w7hd6w0lmj1jaaq9zd5gwnnpkzcr0byqm5q584wjg4xgvsb981j
--8<---------------cut here---------------end--------------->8---

but, now the past reads,

--8<---------------cut here---------------start------------->8---
$ for url in https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz \
             https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz ;  \
      do guix download $url ;done

> > 
Starting download of /tmp/guix-file.MUB3ow
From https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz...
download failed "https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz" 404 "Not Found"

Starting download of /tmp/guix-file.MUB3ow
From https://web.archive.org/web/20240102105016/https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz...
download failed "https://web.archive.org/web/20240102105016/https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz" 404 "NOT FOUND"
Trying to use Disarchive to assemble /tmp/guix-file.MUB3ow...
could not find its Disarchive specification
failed to download "/tmp/guix-file.MUB3ow" from "https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz"
guix download: error: https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz: download failed

Starting download of /tmp/guix-file.ZO9N08
From https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz...
download failed "https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz" 404 "Not Found"

Starting download of /tmp/guix-file.ZO9N08
From https://web.archive.org/web/20240102105018/https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz...
download failed "https://web.archive.org/web/20240102105018/https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz" 404 "NOT FOUND"
Trying to use Disarchive to assemble /tmp/guix-file.ZO9N08...
could not find its Disarchive specification
failed to download "/tmp/guix-file.ZO9N08" from "https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz"
guix download: error: https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz: download failed
--8<---------------cut here---------------end--------------->8---

As explained in [1], Bioconductor removes v1.20.0 from their URI scheme
and despite the fact Bioconductor v3.18 had released v1.20.0, arf!  And
I do not even know if the tarball of v1.20.0 is kept on Bioconductor
infra.  Hum?

Hence the discussion we had: switch from url-fetch to git-fetch.
However, after some investigations, it does not seem straightforward:
The main issue being the almost automatic current updater.  See for
details [2].


1: bug#39885: Bioconductor URI, fallback and time-machine
zimoun <zimon.toutoune@gmail.com>
Mon, 18 Jul 2022 18:03:04 +0200
id:87lesqmmrr.fsf@gmail.com
https://issues.guix.gnu.org/39885
https://issues.guix.gnu.org/msgid/87lesqmmrr.fsf@gmail.com
https://yhetil.org/guix/87lesqmmrr.fsf@gmail.com

2: bug#39885: Bioconductor URI, fallback and time-machine
Ricardo Wurmus <rekado@elephly.net>
Wed, 10 Aug 2022 20:25:00 +0200
id:878rnwuemq.fsf@elephly.net
https://issues.guix.gnu.org/39885
https://issues.guix.gnu.org/msgid/878rnwuemq.fsf@elephly.net
https://yhetil.org/guix/878rnwuemq.fsf@elephly.net


> Any volunteer to write a script that’d generate a list of Bioconductor
> content-addressed URLs (the bordeaux.guix.gnu.org/file ones) for say the
> past couple of years?

I did stuff past week about that.  I will report this week what I did.

Cheers,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-01-08  9:09     ` Simon Tournier
@ 2024-01-08 15:02       ` Ludovic Courtès
  2024-01-10 12:41         ` Ricardo Wurmus
  0 siblings, 1 reply; 31+ messages in thread
From: Ludovic Courtès @ 2024-01-08 15:02 UTC (permalink / raw)
  To: Simon Tournier; +Cc: rekado, Timothy Sample, 39885, me

Hi!

Simon Tournier <zimon.toutoune@gmail.com> skribis:

>> I was wondering whether we’re now doing better for Bioconductor
>> tarballs.  The answer, based on small sample, seems to be “not quite”:

[...]

> but, now the past reads,
>
> $ for url in https://bioconductor.org/packages/release/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz \
>              https://bioconductor.org/packages/3.18/bioc/src/contrib/BiocNeighbors_1.20.0.tar.gz ;  \
>       do guix download $url ;done

Thanks for investigating & explaining!

I my previous message, I wrote:

> As for past tarballs, #swh-devel comrades say we could send them a list
> of URLs and they’d create “Save Code Now” requests on our behalf (we
> cannot do it ourselves since the site doesn’t accept plain tarballs.)

Were you able to retrieve some of these?  What are the chances of
success?

> Hence the discussion we had: switch from url-fetch to git-fetch.
> However, after some investigations, it does not seem straightforward:
> The main issue being the almost automatic current updater.  See for
> details [2].

[...]

> https://issues.guix.gnu.org/msgid/878rnwuemq.fsf@elephly.net

Indeed, thanks for the link.  I agree that long-term moving to
‘git-fetch’ sounds preferable, but there are quite a few obstacles to
overcome.

Ludo’.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2022-08-10 18:25     ` Ricardo Wurmus
  2022-08-10 19:44       ` Maxime Devos
  2022-09-09 17:23       ` zimoun
@ 2024-01-08 15:07       ` Ludovic Courtès
  2024-01-08 15:34         ` Ricardo Wurmus
  2 siblings, 1 reply; 31+ messages in thread
From: Ludovic Courtès @ 2024-01-08 15:07 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Timothy Sample, 39885, me, zimoun

Hi,

(Replying to a 1.5-year-old message…)

Ricardo Wurmus <rekado@elephly.net> skribis:

> I have finally taken the time to review this and implement a first draft
> of a change to the bioconductor importer and updater.
>
> There are some limitations:
>
> - we cannot use the updater to go from “url-fetch” to “git-fetch”.
>   That’s because “package-update” in (guix upstream) decides whether to
>   use package-update/url-fetch or package-update/git-fetch based on the
>   *current* package value’s origin fetch procedure.  For the switch we
>   can hack around this (adding an exception for bioconductor packages),
>   but there is no pretty way to do this in a generic fashion that could
>   be committed.
>
>   Perhaps we could operate on the url included in the <upstream-source>
>   instead of looking at the *current* package value.  We’re only
>   accessing “package” once in the url-fetch case, so maybe we can work
>   around this problem.

Alternatively, how about writing a custom one-shot tool to change the
‘source’ field of all the Bioconductor packages to ‘git-fetch’?

It may be easier than adjusting (guix upstream) to cater to this
probably unusual case.

> - the repositories at https://git.bioconductor.org/package/NAME do not
>   tag package versions.  The only method of organization is branches
>   that are named after *Bioconductor releases* (not package releases),
>   e.g. RELEASE_3_15.  We can only determine the package version by
>   reading its DESCRIPTION file or by looking up the version index for
>   all Bioconductor packages (we do that already).  This means that there
>   could be different commits for the same package version in the same
>   release branch — so we have to include the commit hash and a revision
>   counter in the version string.

OK, sounds acceptable.

> - the updater doesn’t work on version expressions like (git-version
>   "1.12" revision commit).  It expects to be able to replace literal
>   strings.  Because of that my changes let the importer generate a
>   string literal such as "1.12-0.cafebab" without a let-bound commit
>   string.

Maybe we can build upon Maxime’s patch at
<https://issues.guix.gnu.org/53144#13>?

> - “experiment” or “data” packages are not kept in Git.  They only exist
>   as volatile tarballs that will be overwritten.  Thankfully, they don’t
>   change all that often, so they have a good chance of making it into
>   our archives.
>
> - the above exception means that we need to litter the importer and
>   updater code with extra checks.
>
> With all these notes out of the way I’ll prepare a series of patches
> next.

I don’t think it happened but it’d still be nice.  :-)

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2024-01-08 15:07       ` Ludovic Courtès
@ 2024-01-08 15:34         ` Ricardo Wurmus
  2024-01-11 16:11           ` Simon Tournier
  0 siblings, 1 reply; 31+ messages in thread
From: Ricardo Wurmus @ 2024-01-08 15:34 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Timothy Sample, 39885, me, zimoun


Ludovic Courtès <ludovic.courtes@inria.fr> writes:

>> With all these notes out of the way I’ll prepare a series of patches
>> next.
>
> I don’t think it happened but it’d still be nice.  :-)

The WIP commit is here:

https://git.savannah.gnu.org/cgit/guix.git/commit/?h=wip-r&id=e81a75a7b28c633a658ceeb0a728255674f56c58

-- 
Ricardo




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-01-08 15:02       ` Ludovic Courtès
@ 2024-01-10 12:41         ` Ricardo Wurmus
  2024-01-10 15:23           ` Simon Tournier
  0 siblings, 1 reply; 31+ messages in thread
From: Ricardo Wurmus @ 2024-01-10 12:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Timothy Sample, 39885, me, Simon Tournier


Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> I my previous message, I wrote:
>
>> As for past tarballs, #swh-devel comrades say we could send them a list
>> of URLs and they’d create “Save Code Now” requests on our behalf (we
>> cannot do it ourselves since the site doesn’t accept plain tarballs.)
>
> Were you able to retrieve some of these?  What are the chances of
> success?

Do we have a list of desired tarballs?  I still have an archived
/gnu/store from before we moved the shared cluster installation at the
MDC to different storage.  It might contain old tarballs that we no
longer have elsewhere.

-- 
Ricardo




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-01-10 12:41         ` Ricardo Wurmus
@ 2024-01-10 15:23           ` Simon Tournier
  0 siblings, 0 replies; 31+ messages in thread
From: Simon Tournier @ 2024-01-10 15:23 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Timothy Sample, Ludovic Courtès, me, 39885

Hi,

On Wed, 10 Jan 2024 at 13:42, Ricardo Wurmus <rekado@elephly.net> wrote:

> Do we have a list of desired tarballs?

I made one. :-)  I will share the manifest once a bit more polish. ;-)

Roughly, until 3.11 -- for earlier, I am still fighting with
time-machine -- and only the versions right before a Bioconductor
upgrade (and sadly not the version after such upgrade that, for many
have disappeared from Bioconductor).  If you have them, cool!  Well,
for now, I am speaking about all the Bioconductor packages except
annotations for these Guix revisions:

--8<---------------cut here---------------start------------->8---
git log --format="%P %s" --after=2019-04-30                     \
    | grep -i -E 'bioconductor' | grep -i -E '(update|upgrade)' \
    | cut -f1 -d' '                                             \
    | head -8
--8<---------------cut here---------------end--------------->8---

And for instance, I have this kind of sources.json.

--8<---------------cut here---------------start------------->8---
{
  "sources": [
    {
      "type": "url",
      "urls": [
        "https://bordeaux.guix.gnu.org/file/progeny_1.22.0.tar.gz/sha256/047x6by3xa15gvi3kny5pkqxaq8d2kzcfi55ic5j7a351715l6l7",
        "https://ci.guix.gnu.org/file/progeny_1.22.0.tar.gz/sha256/047x6by3xa15gvi3kny5pkqxaq8d2kzcfi55ic5j7a351715l6l7",
        "https://bioconductor.org/packages/3.17/bioc/src/contrib/progeny_1.22.0.tar.gz",
        "https://tarballs.nixos.org/sha256/047x6by3xa15gvi3kny5pkqxaq8d2kzcfi55ic5j7a351715l6l7"
      ],
      "integrity": "sha256-hxpawgllqCMLi6VEx/4UDWHV8bzF2znifiWoPvwy/RA=",
      "outputHashAlgo": "sha256",
      "outputHashMode": "flat"
    },
    {
      "type": "url",
      "urls": [
        "https://bordeaux.guix.gnu.org/file/AWFisher_1.14.0.tar.gz/sha256/1c6rr1z1rhvn8w1kb3nnjlfacfr22vwm1rsa1xqm2hmghs01bq4x",
        "https://ci.guix.gnu.org/file/AWFisher_1.14.0.tar.gz/sha256/1c6rr1z1rhvn8w1kb3nnjlfacfr22vwm1rsa1xqm2hmghs01bq4x",
        "https://bioconductor.org/packages/3.17/bioc/src/contrib/AWFisher_1.14.0.tar.gz",
        "https://tarballs.nixos.org/sha256/1c6rr1z1rhvn8w1kb3nnjlfacfr22vwm1rsa1xqm2hmghs01bq4x"
      ],
      "integrity": "sha256-neAVgIavQlFxD0rnUPkWIjumHJXWjjUDR3bDHH7I2bA=",
      "outputHashAlgo": "sha256",
      "outputHashMode": "flat"
    },
--8<---------------cut here---------------end--------------->8---

Where the URL for ci.guix is incorrect, I guess.

Cheers,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor URI, fallback and time-machine
  2024-01-08 15:34         ` Ricardo Wurmus
@ 2024-01-11 16:11           ` Simon Tournier
  0 siblings, 0 replies; 31+ messages in thread
From: Simon Tournier @ 2024-01-11 16:11 UTC (permalink / raw)
  To: Ricardo Wurmus, Ludovic Courtès; +Cc: Timothy Sample, 39885, me

Hi,

On Mon, 08 Jan 2024 at 16:34, Ricardo Wurmus <rekado@elephly.net> wrote:

> The WIP commit is here:
>
> https://git.savannah.gnu.org/cgit/guix.git/commit/?h=wip-r&id=e81a75a7b28c633a658ceeb0a728255674f56c58

IIRC, the main feedback [1] of this approach is:

        - the repositories at https://git.bioconductor.org/package/NAME do not
          tag package versions.  The only method of organization is branches
          that are named after *Bioconductor releases* (not package releases),
          e.g. RELEASE_3_15.  We can only determine the package version by
          reading its DESCRIPTION file or by looking up the version index for
          all Bioconductor packages (we do that already).  This means that there
          could be different commits for the same package version in the same
          release branch — so we have to include the commit hash and a revision
          counter in the version string.

Have you tried the wip commit at scale?

Cheers,
simon



1:bug#39885: Bioconductor URI, fallback and time-machine
Ricardo Wurmus <rekado@elephly.net>
Wed, 10 Aug 2022 20:25:00 +0200
id:878rnwuemq.fsf@elephly.net
https://issues.guix.gnu.org/39885
https://issues.guix.gnu.org/msgid/878rnwuemq.fsf@elephly.net
https://yhetil.org/guix/878rnwuemq.fsf@elephly.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2023-12-22 13:40   ` bug#39885: Bioconductor tarballs are not archived Ludovic Courtès
  2024-01-08  9:09     ` Simon Tournier
@ 2024-01-19 15:46     ` Timothy Sample
  2024-01-23  9:10       ` Ludovic Courtès
  2024-02-14 15:23       ` Simon Tournier
  1 sibling, 2 replies; 31+ messages in thread
From: Timothy Sample @ 2024-01-19 15:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: rekado, 39885, me, zimoun

[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]

Hello,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> As for past tarballs, #swh-devel comrades say we could send them a list
> of URLs and they’d create “Save Code Now” requests on our behalf (we
> cannot do it ourselves since the site doesn’t accept plain tarballs.)
>
> Any volunteer to write a script that’d generate a list of Bioconductor
> content-addressed URLs (the bordeaux.guix.gnu.org/file ones) for say the
> past couple of years?

Sorry I’m a little late to this party, but I wrote a similar script a
while ago.  It creates a “sources.json” file of all the sources that the
PoG database analyzed and found missing in SWH.  It only covers what PoG
monitors (which is *almost* everything, but not quite).

  $ git clone https://git.ngyro.com/preservation-of-guix
  $ cd preservation-of-guix
  $ wget https://ngyro.com/pog-reports/latest/pog.db

  [Wait a long time because my server is sloooow.]

  $ guile -L . etc/sources.scm pog.db > missing-sources.json

With some modifications, I used it to generate the attached list of
Bioconductor sources (based off of recent, unpublished PoG data).  I’ve
also attached the modifications in case anyone is curious or wants to
make a similar list.  I will publish the PoG database soon (today?), so
maybe wait for that before generating any lists.


-- Tim


[-- Attachment #2: bioconductor-sources.json.gz --]
[-- Type: application/octet-stream, Size: 50553 bytes --]

[-- Attachment #3: bioconductor.patch --]
[-- Type: text/x-patch, Size: 2040 bytes --]

diff --git a/etc/sources.scm b/etc/sources.scm
index 71d157d..515cf00 100644
--- a/etc/sources.scm
+++ b/etc/sources.scm
@@ -1,5 +1,5 @@
 ;;; Preservation of Guix
-;;; Copyright © 2022 Timothy Sample <samplet@ngyro.com>
+;;; Copyright © 2022, 2024 Timothy Sample <samplet@ngyro.com>
 ;;;
 ;;; This file is part of Preservation of Guix.
 ;;;
@@ -61,6 +61,7 @@ FROM fods f
 WHERE f.algorithm = 'sha256'
     AND (fr.reference LIKE '\"%'
         OR fr.reference LIKE '(\"%')
+    AND fr.reference LIKE '%bioconductor.org%'
     AND NOT fr.is_error
     AND f.is_in_swh IS NOT NULL
     AND NOT f.is_in_swh")
@@ -85,22 +86,25 @@ Subresource Integrity metadata value."
   (define b64 (base64-encode bv))
   (string-append "sha256-" b64))
 
-(define (web-reference-urls reference)
+(define (web-reference-filename reference)
   (define uris
     (match (call-with-input-string reference read)
       ((urls ...) (map string->uri urls))
       (url (list (string->uri url)))))
-  (append-map (lambda (uri)
-                (map uri->string
-                     (maybe-expand-mirrors uri %mirrors)))
-              uris))
+  (or (any (lambda (uri)
+             (and (string-suffix? "bioconductor.org" (uri-host uri))
+                  (basename (uri-path uri))))
+           uris)
+      (error "Not a 'bioconductor.org' refernce" reference)))
 
 (define (record->url-source rec)
   (match-let ((#(digest reference) rec))
-    (let ((urls (web-reference-urls reference))
-          (integrity (nix-base32-sha256->subresource-integrity digest)))
+    (let* ((filename (web-reference-filename reference))
+           (url (string-append "https://bordeaux.guix.gnu.org/file/"
+                               filename "/sha256/" digest))
+           (integrity (nix-base32-sha256->subresource-integrity digest)))
       `(("type" . "url")
-        ("urls" . ,(list->vector urls))
+        ("urls" . ,(vector url))
         ("integrity" . ,integrity)))))
 
 (define (lookup-missing-sources db)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-01-19 15:46     ` Timothy Sample
@ 2024-01-23  9:10       ` Ludovic Courtès
  2024-02-14 15:23       ` Simon Tournier
  1 sibling, 0 replies; 31+ messages in thread
From: Ludovic Courtès @ 2024-01-23  9:10 UTC (permalink / raw)
  To: Timothy Sample; +Cc: rekado, 39885, me, zimoun

Hi Timothy,

Timothy Sample <samplet@ngyro.com> skribis:

> Hello,
>
> Ludovic Courtès <ludovic.courtes@inria.fr> writes:
>
>> As for past tarballs, #swh-devel comrades say we could send them a list
>> of URLs and they’d create “Save Code Now” requests on our behalf (we
>> cannot do it ourselves since the site doesn’t accept plain tarballs.)
>>
>> Any volunteer to write a script that’d generate a list of Bioconductor
>> content-addressed URLs (the bordeaux.guix.gnu.org/file ones) for say the
>> past couple of years?
>
> Sorry I’m a little late to this party, but I wrote a similar script a
> while ago.  It creates a “sources.json” file of all the sources that the
> PoG database analyzed and found missing in SWH.  It only covers what PoG
> monitors (which is *almost* everything, but not quite).

Excellent!

> With some modifications, I used it to generate the attached list of
> Bioconductor sources (based off of recent, unpublished PoG data).  I’ve
> also attached the modifications in case anyone is curious or wants to
> make a similar list.  I will publish the PoG database soon (today?), so
> maybe wait for that before generating any lists.

After discussing it on #swh-devel, I filed this issue:

  https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/5222

Tim, you were planning to offer a larger list of missing sources
extracted from the PoG database, right?

Thank you!

Ludo’.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-01-19 15:46     ` Timothy Sample
  2024-01-23  9:10       ` Ludovic Courtès
@ 2024-02-14 15:23       ` Simon Tournier
  2024-02-16 16:14         ` Timothy Sample
  1 sibling, 1 reply; 31+ messages in thread
From: Simon Tournier @ 2024-02-14 15:23 UTC (permalink / raw)
  To: Timothy Sample, Ludovic Courtès; +Cc: rekado, 39885, me

Hi,

On ven., 19 janv. 2024 at 09:46, Timothy Sample <samplet@ngyro.com> wrote:

>   $ git clone https://git.ngyro.com/preservation-of-guix
>   $ cd preservation-of-guix
>   $ wget https://ngyro.com/pog-reports/latest/pog.db
>
>   [Wait a long time because my server is sloooow.]
>
>   $ guile -L . etc/sources.scm pog.db > missing-sources.json

Cool!

Can we consider that this report is now done?  Because:

1. SWH supports ExtID and nar hash lookup.

2. Missing origins are currently ingested by SWH.
   (via specific sources.json)


Cheers,
simon





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-02-14 15:23       ` Simon Tournier
@ 2024-02-16 16:14         ` Timothy Sample
  2024-02-19 16:50           ` Simon Tournier
  0 siblings, 1 reply; 31+ messages in thread
From: Timothy Sample @ 2024-02-16 16:14 UTC (permalink / raw)
  To: Simon Tournier; +Cc: rekado, Ludovic Courtès, me, 39885

Simon Tournier <zimon.toutoune@gmail.com> writes:

> Cool!
>
> Can we consider that this report is now done?  Because:
>
> 1. SWH supports ExtID and nar hash lookup.
>
> 2. Missing origins are currently ingested by SWH.
>    (via specific sources.json)

I think that would be jumping the gun a little bit.

In some sense, the report is only *done* when “stored” hits 100% (or
close to it, with the remainder being stuff we are pretty sure no longer
exists).  This won’t happen just because of your second point there.
When the historical “sources.json” is loaded, things will be much, much,
better, sure.  Sources will still be missing, though.  To me, this is an
invitation to more subtle analysis, like weighing sources by their
“importance” in the package graph.  Then there’s still shortcomings with
Disarchive that have to be resolved (which is work best guided by
numbers in the report).

Also, it will always be a good idea to verify that things are working.
Ideally this could be simpler (leveraging ExtID lookup) and continuous.


-- Tim




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-02-16 16:14         ` Timothy Sample
@ 2024-02-19 16:50           ` Simon Tournier
  2024-02-21 18:16             ` Timothy Sample
  0 siblings, 1 reply; 31+ messages in thread
From: Simon Tournier @ 2024-02-19 16:50 UTC (permalink / raw)
  To: Timothy Sample; +Cc: rekado, Ludovic Courtès, me, 39885

Hi,

On ven., 16 févr. 2024 at 10:14, Timothy Sample <samplet@ngyro.com> wrote:

>> Can we consider that this report is now done?  Because:
>>
>> 1. SWH supports ExtID and nar hash lookup.
>>
>> 2. Missing origins are currently ingested by SWH.
>>    (via specific sources.json)
>
> I think that would be jumping the gun a little bit.
>
> In some sense, the report is only *done* when “stored” hits 100% (or
> close to it, with the remainder being stuff we are pretty sure no longer
> exists).  This won’t happen just because of your second point there.

Just to be sure: we are speaking about Bioconductor only, right?


> When the historical “sources.json” is loaded, things will be much, much,
> better, sure.  Sources will still be missing, though.

Yeah, sources will still be missing but I expect that Bioconductor will
be not.  The only issue is about “annotation” and maybe “experiment”.
However, here we are hitting the boundary between code and data:
annotation and experiment might be very large and potentially skipped by
SWH and they contain few if no code but plain data.

We can still discuss what to do here; in this already long thread. :-)
Or we can open another thread for this specific case about Bioconductor
annotation and experiment.

>                                                       To me, this is an
> invitation to more subtle analysis, like weighing sources by their
> “importance” in the package graph.  Then there’s still shortcomings with
> Disarchive that have to be resolved (which is work best guided by
> numbers in the report).

Yeah.  But that seems a large scope than Bioconductor case, no?


> Also, it will always be a good idea to verify that things are working.
> Ideally this could be simpler (leveraging ExtID lookup) and continuous.

Indeed, checking that all Bioconductor sources can be extracted from
SWH+Disarchive seems the path forward closing this report. :-)

Cheers,
simon




^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#39885: Bioconductor tarballs are not archived
  2024-02-19 16:50           ` Simon Tournier
@ 2024-02-21 18:16             ` Timothy Sample
  0 siblings, 0 replies; 31+ messages in thread
From: Timothy Sample @ 2024-02-21 18:16 UTC (permalink / raw)
  To: Simon Tournier; +Cc: rekado, Ludovic Courtès, me, 39885

Hi Simon,

Simon Tournier <zimon.toutoune@gmail.com> writes:

> Just to be sure: we are speaking about Bioconductor only, right?

I took “this report” to be the PoG report not the bug report.  My
mistake!  Sorry for the confusion.


-- Tim




^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-02-21 18:29 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03 15:59 bug#39885: Bioconductor URI, fallback and time-machine zimoun
2020-03-23 21:20 ` Ricardo Wurmus
2020-05-21 23:29   ` zimoun
2020-06-24 11:07 ` zimoun
2020-06-28 20:14   ` Ludovic Courtès
2020-06-29 17:36     ` zimoun
2020-06-29 20:42       ` Ludovic Courtès
2020-11-19 14:22 ` zimoun
2021-11-22 19:48 ` zimoun
2022-07-18 16:03 ` zimoun
2022-07-18 16:21   ` Ricardo Wurmus
2022-08-10 18:25     ` Ricardo Wurmus
2022-08-10 19:44       ` Maxime Devos
2022-08-10 19:48         ` Maxime Devos
2022-09-09 17:23       ` zimoun
2024-01-08 15:07       ` Ludovic Courtès
2024-01-08 15:34         ` Ricardo Wurmus
2024-01-11 16:11           ` Simon Tournier
2023-12-22 13:40   ` bug#39885: Bioconductor tarballs are not archived Ludovic Courtès
2024-01-08  9:09     ` Simon Tournier
2024-01-08 15:02       ` Ludovic Courtès
2024-01-10 12:41         ` Ricardo Wurmus
2024-01-10 15:23           ` Simon Tournier
2024-01-19 15:46     ` Timothy Sample
2024-01-23  9:10       ` Ludovic Courtès
2024-02-14 15:23       ` Simon Tournier
2024-02-16 16:14         ` Timothy Sample
2024-02-19 16:50           ` Simon Tournier
2024-02-21 18:16             ` Timothy Sample
2023-12-22 20:57   ` bug#39885: Bioconductor URI, fallback and time-machine Ludovic Courtès
2024-01-02  9:20     ` Simon Tournier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).