unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: 37224@debbugs.gnu.org
Subject: [bug#37224] [PATCH 4/4] lint: Add 'archival' checker.
Date: Fri, 30 Aug 2019 01:21:01 +0200	[thread overview]
Message-ID: <20190829232101.8153-4-ludo@gnu.org> (raw)
In-Reply-To: <20190829232101.8153-1-ludo@gnu.org>

* guix/lint.scm (check-archival): New procedure.
(%network-dependent-checkers): Add 'archival' checker.
* tests/lint.scm ("archival: missing content")
("archival: content available")
("archival: missing revision")
("archival: revision available")
("archival: rate limit reached"): New tests.
* doc/guix.texi (Invoking guix lint): Document it.
---
 doc/guix.texi  | 25 +++++++++++++
 guix/lint.scm  | 96 +++++++++++++++++++++++++++++++++++++++++++++++++-
 tests/lint.scm | 81 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 201 insertions(+), 1 deletion(-)

diff --git a/doc/guix.texi b/doc/guix.texi
index 707c2ba700..582f3a124b 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -9233,6 +9233,31 @@ Parse the @code{source} URL to determine if a tarball from GitHub is
 autogenerated or if it is a release tarball.  Unfortunately GitHub's
 autogenerated tarballs are sometimes regenerated.
 
+@item archival
+@cindex Software Heritage, source code archive
+@cindex archival of source code, Software Heritage
+Checks whether the package's source code is archived at
+@uref{https://www.softwareheritage.org, Software Heritage}.
+
+When the source code that is not archived comes from a version-control system
+(VCS)---e.g., it's obtained with @code{git-fetch}, send Software Heritage a
+``save'' request so that it eventually archives it.  This ensures that the
+source will remain available in the long term, and that Guix can fall back to
+Software Heritage should the source code disappear from its original host.
+The status of recent ``save'' requests can be
+@uref{https://archive.softwareheritage.org/save/#requests, viewed on-line}.
+
+When source code is a tarball obtained with @code{url-fetch}, simply print a
+message when it is not archived.  As of this writing, Software Heritage does
+not allow requests to save arbitrary tarballs; we are working on ways to
+ensure that non-VCS source code is also archived.
+
+Software Heritage
+@uref{https://archive.softwareheritage.org/api/#rate-limiting, limits the
+request rate per IP address}.  When the limit is reached, @command{guix lint}
+prints a message and the @code{archival} checker stops doing anything until
+that limit has been reset.
+
 @item cve
 @cindex security vulnerabilities
 @cindex CVE, Common Vulnerabilities and Exposures
diff --git a/guix/lint.scm b/guix/lint.scm
index 2bf5097403..98ac77556e 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -44,6 +44,8 @@
   #:use-module ((guix ui) #:select (texi->plain-text fill-paragraph))
   #:use-module (guix gnu-maintenance)
   #:use-module (guix cve)
+  #:use-module ((guix swh) #:hide (origin?))
+  #:autoload   (guix git-download) (git-reference?)
   #:use-module (ice-9 match)
   #:use-module (ice-9 regex)
   #:use-module (ice-9 format)
@@ -80,6 +82,7 @@
             check-vulnerabilities
             check-for-updates
             check-formatting
+            check-archival
 
             lint-warning
             lint-warning?
@@ -1023,6 +1026,93 @@ the NIST server non-fatal."
          '()))
     (#f '()))) ; cannot find newer upstream release
 
+
+(define (check-archival package)
+  "Check whether PACKAGE's source code is archived on Software Heritage.  If
+it's not, and if its source code is a VCS snapshot, then send a \"save\"
+request to Software Heritage.
+
+Software Heritage imposes limits on the request rate per client IP address.
+This checker prints a notice and stops doing anything once that limit has been
+reached."
+  (define (response->warning url method response)
+    (if (request-rate-limit-reached? url method)
+        (list (make-warning package
+                            (G_ "Software Heritage rate limit reached; \
+try again later")
+                            #:field 'source))
+        (list (make-warning package
+                            (G_ "'~a' returned ~a")
+                            (list url (response-code response))
+                            #:field 'source))))
+
+  (define skip-key (gensym "skip-archival-check"))
+
+  (define (skip-when-limit-reached url method)
+    (or (not (request-rate-limit-reached? url method))
+        (throw skip-key #t)))
+
+  (parameterize ((%allow-request? skip-when-limit-reached))
+    (catch #t
+      (lambda ()
+        (match (and (origin? (package-source package))
+                    (package-source package))
+          (#f                                     ;no source
+           '())
+          ((= origin-uri (? git-reference? reference))
+           (define url
+             (git-reference-url reference))
+           (define commit
+             (git-reference-commit reference))
+
+           (match (if (commit-id? commit)
+                      (or (lookup-revision commit)
+                          (lookup-origin-revision url commit))
+                      (lookup-origin-revision url commit))
+             ((? revision? revision)
+              '())
+             (#f
+              ;; Revision is missing from the archive, attempt to save it.
+              (catch 'swh-error
+                (lambda ()
+                  (save-origin (git-reference-url reference) "git")
+                  (list (make-warning
+                         package
+                         ;; TRANSLATORS: "Software Heritage" is a proper noun
+                         ;; that must remain untranslated.  See
+                         ;; <https://www.softwareheritage.org>.
+                         (G_ "scheduled Software Heritage archival")
+                         #:field 'source)))
+                (lambda (key url method response . _)
+                  (cond ((= 429 (response-code response))
+                         (list (make-warning
+                                package
+                                (G_ "archival rate limit exceeded; \
+try again later")
+                                #:field 'source)))
+                        (else
+                         (response->warning url method response))))))))
+          ((? origin? origin)
+           ;; Since "save" origins are not supported for non-VCS source, all
+           ;; we can do is tell whether a given tarball is available or not.
+           (if (origin-sha256 origin)             ;XXX: for ungoogled-chromium
+               (match (lookup-content (origin-sha256 origin) "sha256")
+                 (#f
+                  (list (make-warning package
+                                      (G_ "source not archived on Software \
+Heritage")
+                                      #:field 'source)))
+                 ((? content?)
+                  '()))
+               '()))))
+      (match-lambda*
+        ((key url method response)
+         (response->warning url method response))
+        ((key . args)
+         (if (eq? key skip-key)
+             '()
+             (apply throw key args)))))))
+
 \f
 ;;;
 ;;; Source code formatting.
@@ -1227,7 +1317,11 @@ or a list thereof")
    (lint-checker
      (name        'refresh)
      (description "Check the package for new upstream releases")
-     (check       check-for-updates))))
+     (check       check-for-updates))
+   (lint-checker
+     (name        'archival)
+     (description "Ensure source code archival on Software Heritage")
+     (check       check-archival))))
 
 (define %all-checkers
   (append %local-checkers
diff --git a/tests/lint.scm b/tests/lint.scm
index c8b88136f4..1b92f02b85 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -35,6 +35,7 @@
   #:use-module (guix packages)
   #:use-module (guix lint)
   #:use-module (guix ui)
+  #:use-module (guix swh)
   #:use-module (gnu packages)
   #:use-module (gnu packages glib)
   #:use-module (gnu packages pkg-config)
@@ -47,6 +48,7 @@
   #:use-module (ice-9 regex)
   #:use-module (ice-9 getopt-long)
   #:use-module (ice-9 pretty-print)
+  #:use-module (rnrs bytevectors)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-9 gnu)
   #:use-module (srfi srfi-26)
@@ -859,6 +861,85 @@
   '()
   (check-formatting (dummy-package "x")))
 
+(test-assert "archival: missing content"
+  (let* ((origin   (origin
+                     (method url-fetch)
+                     (uri "http://example.org/foo.tgz")
+                     (sha256 (make-bytevector 32))))
+         (warnings (with-http-server '((404 "Not archived."))
+                     (parameterize ((%swh-base-url (%local-url)))
+                       (check-archival (dummy-package "x"
+                                                      (source origin)))))))
+    (warning-contains? "not archived" warnings)))
+
+(test-equal "archival: content available"
+  '()
+  (let* ((origin   (origin
+                     (method url-fetch)
+                     (uri "http://example.org/foo.tgz")
+                     (sha256 (make-bytevector 32))))
+         ;; https://archive.softwareheritage.org/api/1/content/
+         (content  "{ \"checksums\": {}, \"data_url\": \"xyz\",
+                      \"length\": 42 }"))
+    (with-http-server `((200 ,content))
+      (parameterize ((%swh-base-url (%local-url)))
+        (check-archival (dummy-package "x" (source origin)))))))
+
+(test-assert "archival: missing revision"
+  (let* ((origin   (origin
+                     (method git-fetch)
+                     (uri (git-reference
+                           (url "http://example.org/foo.git")
+                           (commit "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")))
+                     (sha256 (make-bytevector 32))))
+         ;; https://archive.softwareheritage.org/api/1/origin/save/
+         (save     "{ \"origin_url\": \"http://example.org/foo.git\",
+                      \"save_request_date\": \"2014-11-17T22:09:38+01:00\",
+                      \"save_request_status\": \"accepted\",
+                      \"save_task_status\": \"scheduled\" }")
+         (warnings (with-http-server `((404 "No revision.") ;lookup-revision
+                                       (404 "No origin.")   ;lookup-origin
+                                       (200 ,save))         ;save-origin
+                     (parameterize ((%swh-base-url (%local-url)))
+                       (check-archival (dummy-package "x" (source origin)))))))
+    (warning-contains? "scheduled" warnings)))
+
+(test-equal "archival: revision available"
+  '()
+  (let* ((origin   (origin
+                     (method git-fetch)
+                     (uri (git-reference
+                           (url "http://example.org/foo.git")
+                           (commit "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")))
+                     (sha256 (make-bytevector 32))))
+         ;; https://archive.softwareheritage.org/api/1/revision/
+         (revision "{ \"author\": {}, \"parents\": [],
+                      \"date\": \"2014-11-17T22:09:38+01:00\" }"))
+    (with-http-server `((200 ,revision))
+      (parameterize ((%swh-base-url (%local-url)))
+        (check-archival (dummy-package "x" (source origin)))))))
+
+(test-assert "archival: rate limit reached"
+  ;; We should get a single warning stating that the rate limit was reached,
+  ;; and nothing more, in particular no other HTTP requests.
+  (let* ((origin   (origin
+                     (method url-fetch)
+                     (uri "http://example.org/foo.tgz")
+                     (sha256 (make-bytevector 32))))
+         (too-many (build-response
+                    #:code 429
+                    #:reason-phrase "Too many requests"
+                    #:headers '((x-ratelimit-remaining . "0")
+                                (x-ratelimit-reset . "3000000000"))))
+         (warnings (with-http-server `((,too-many "Rate limit reached."))
+                     (parameterize ((%swh-base-url (%local-url)))
+                       (append-map (lambda (name)
+                                     (check-archival
+                                      (dummy-package name (source origin))))
+                                   '("x" "y" "z"))))))
+    (string-contains (single-lint-warning-message warnings)
+                     "rate limit reached")))
+
 (test-end "lint")
 
 ;; Local Variables:
-- 
2.23.0

  parent reply	other threads:[~2019-08-29 23:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-29 23:16 [bug#37224] [PATCH 0/4] Add 'archival' checker for 'guix lint' Ludovic Courtès
2019-08-29 23:20 ` [bug#37224] [PATCH 1/4] tests: 'with-http-server' accepts multiple responses Ludovic Courtès
2019-08-29 23:20   ` [bug#37224] [PATCH 2/4] swh: Add hooks for rate limiting handling Ludovic Courtès
2019-08-29 23:21   ` [bug#37224] [PATCH 3/4] swh: Make 'commit-id?' public Ludovic Courtès
2019-08-29 23:21   ` Ludovic Courtès [this message]
2019-09-02 13:28 ` bug#37224: [PATCH 0/4] Add 'archival' checker for 'guix lint' Ludovic Courtès
2019-09-11 10:20 ` [bug#37224] " zimoun
2019-09-12  7:41   ` Ludovic Courtès
2019-09-12  9:52     ` zimoun
2019-09-13  8:49       ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190829232101.8153-4-ludo@gnu.org \
    --to=ludo@gnu.org \
    --cc=37224@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).