unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Timothy Sample <samplet@ngyro.com>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>
Cc: rekado@elephly.net, 39885@debbugs.gnu.org, me@tobias.gr,
	zimoun <zimon.toutoune@gmail.com>
Subject: bug#39885: Bioconductor tarballs are not archived
Date: Fri, 19 Jan 2024 09:46:21 -0600	[thread overview]
Message-ID: <8734utba2a.fsf@ngyro.com> (raw)
In-Reply-To: <874jgacq4u.fsf_-_@gnu.org> ("Ludovic Courtès"'s message of "Fri, 22 Dec 2023 14:40:01 +0100")

[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]

Hello,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> As for past tarballs, #swh-devel comrades say we could send them a list
> of URLs and they’d create “Save Code Now” requests on our behalf (we
> cannot do it ourselves since the site doesn’t accept plain tarballs.)
>
> Any volunteer to write a script that’d generate a list of Bioconductor
> content-addressed URLs (the bordeaux.guix.gnu.org/file ones) for say the
> past couple of years?

Sorry I’m a little late to this party, but I wrote a similar script a
while ago.  It creates a “sources.json” file of all the sources that the
PoG database analyzed and found missing in SWH.  It only covers what PoG
monitors (which is *almost* everything, but not quite).

  $ git clone https://git.ngyro.com/preservation-of-guix
  $ cd preservation-of-guix
  $ wget https://ngyro.com/pog-reports/latest/pog.db

  [Wait a long time because my server is sloooow.]

  $ guile -L . etc/sources.scm pog.db > missing-sources.json

With some modifications, I used it to generate the attached list of
Bioconductor sources (based off of recent, unpublished PoG data).  I’ve
also attached the modifications in case anyone is curious or wants to
make a similar list.  I will publish the PoG database soon (today?), so
maybe wait for that before generating any lists.


-- Tim


[-- Attachment #2: bioconductor-sources.json.gz --]
[-- Type: application/octet-stream, Size: 50553 bytes --]

[-- Attachment #3: bioconductor.patch --]
[-- Type: text/x-patch, Size: 2040 bytes --]

diff --git a/etc/sources.scm b/etc/sources.scm
index 71d157d..515cf00 100644
--- a/etc/sources.scm
+++ b/etc/sources.scm
@@ -1,5 +1,5 @@
 ;;; Preservation of Guix
-;;; Copyright © 2022 Timothy Sample <samplet@ngyro.com>
+;;; Copyright © 2022, 2024 Timothy Sample <samplet@ngyro.com>
 ;;;
 ;;; This file is part of Preservation of Guix.
 ;;;
@@ -61,6 +61,7 @@ FROM fods f
 WHERE f.algorithm = 'sha256'
     AND (fr.reference LIKE '\"%'
         OR fr.reference LIKE '(\"%')
+    AND fr.reference LIKE '%bioconductor.org%'
     AND NOT fr.is_error
     AND f.is_in_swh IS NOT NULL
     AND NOT f.is_in_swh")
@@ -85,22 +86,25 @@ Subresource Integrity metadata value."
   (define b64 (base64-encode bv))
   (string-append "sha256-" b64))
 
-(define (web-reference-urls reference)
+(define (web-reference-filename reference)
   (define uris
     (match (call-with-input-string reference read)
       ((urls ...) (map string->uri urls))
       (url (list (string->uri url)))))
-  (append-map (lambda (uri)
-                (map uri->string
-                     (maybe-expand-mirrors uri %mirrors)))
-              uris))
+  (or (any (lambda (uri)
+             (and (string-suffix? "bioconductor.org" (uri-host uri))
+                  (basename (uri-path uri))))
+           uris)
+      (error "Not a 'bioconductor.org' refernce" reference)))
 
 (define (record->url-source rec)
   (match-let ((#(digest reference) rec))
-    (let ((urls (web-reference-urls reference))
-          (integrity (nix-base32-sha256->subresource-integrity digest)))
+    (let* ((filename (web-reference-filename reference))
+           (url (string-append "https://bordeaux.guix.gnu.org/file/"
+                               filename "/sha256/" digest))
+           (integrity (nix-base32-sha256->subresource-integrity digest)))
       `(("type" . "url")
-        ("urls" . ,(list->vector urls))
+        ("urls" . ,(vector url))
         ("integrity" . ,integrity)))))
 
 (define (lookup-missing-sources db)

  parent reply	other threads:[~2024-01-19 15:47 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-03 15:59 bug#39885: Bioconductor URI, fallback and time-machine zimoun
2020-03-23 21:20 ` Ricardo Wurmus
2020-05-21 23:29   ` zimoun
2020-06-24 11:07 ` zimoun
2020-06-28 20:14   ` Ludovic Courtès
2020-06-29 17:36     ` zimoun
2020-06-29 20:42       ` Ludovic Courtès
2020-11-19 14:22 ` zimoun
2021-11-22 19:48 ` zimoun
2022-07-18 16:03 ` zimoun
2022-07-18 16:21   ` Ricardo Wurmus
2022-08-10 18:25     ` Ricardo Wurmus
2022-08-10 19:44       ` Maxime Devos
2022-08-10 19:48         ` Maxime Devos
2022-09-09 17:23       ` zimoun
2024-01-08 15:07       ` Ludovic Courtès
2024-01-08 15:34         ` Ricardo Wurmus
2024-01-11 16:11           ` Simon Tournier
2023-12-22 13:40   ` bug#39885: Bioconductor tarballs are not archived Ludovic Courtès
2024-01-08  9:09     ` Simon Tournier
2024-01-08 15:02       ` Ludovic Courtès
2024-01-10 12:41         ` Ricardo Wurmus
2024-01-10 15:23           ` Simon Tournier
2024-01-19 15:46     ` Timothy Sample [this message]
2024-01-23  9:10       ` Ludovic Courtès
2024-02-14 15:23       ` Simon Tournier
2024-02-16 16:14         ` Timothy Sample
2024-02-19 16:50           ` Simon Tournier
2024-02-21 18:16             ` Timothy Sample
2023-12-22 20:57   ` bug#39885: Bioconductor URI, fallback and time-machine Ludovic Courtès
2024-01-02  9:20     ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8734utba2a.fsf@ngyro.com \
    --to=samplet@ngyro.com \
    --cc=39885@debbugs.gnu.org \
    --cc=ludovic.courtes@inria.fr \
    --cc=me@tobias.gr \
    --cc=rekado@elephly.net \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).