all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / Atom feed
* bug#44187: whishlist: time-machine --channel falls back to SWH
@ 2020-10-23 22:17 zimoun
  2021-03-05 14:51 ` Ludovic Courtès
  2021-09-17  8:02 ` bug#44187: Channel clones lack SWH fallback zimoun
  0 siblings, 2 replies; 15+ messages in thread
From: zimoun @ 2020-10-23 22:17 UTC (permalink / raw)
  To: 44187

Dear,

Let’s describe the use case.  Consider that:

  guix time-machine -C channels -- install foo

is provided in some documentation, say scientific paper.  Where the
channels.scm file is completly described:

--8<---------------cut here---------------start------------->8---
(list (channel
        (name 'kikoo)
        (url "https://example.org/that-great.git")
        (commit
          "353bdae32f72b720c7ddd706576ccc40e2b43f95")))
--8<---------------cut here---------------end--------------->8---

In the future, if https://example.org/that-great.git disappears, then
build/install the package ’foo’ is becoming difficult, nor impossible.

However, let’s consider that the repo ’that-great’ had been saved in SWH
(say manually); since it is a regular Git repo.  Guix should be able to
fallback to it transparently.


Obviously, another whislist is to have something to ease the save
request of the channel on SWH.  Maybe this latter could be part of the
several-times discussed “guix channel” subcommand. :-)


All the best,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: whishlist: time-machine --channel falls back to SWH
  2020-10-23 22:17 bug#44187: whishlist: time-machine --channel falls back to SWH zimoun
@ 2021-03-05 14:51 ` Ludovic Courtès
  2021-09-10 14:34   ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones Ludovic Courtès
  2021-09-17  8:02 ` bug#44187: Channel clones lack SWH fallback zimoun
  1 sibling, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-03-05 14:51 UTC (permalink / raw)
  To: zimoun; +Cc: 44187

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> Let’s describe the use case.  Consider that:
>
>   guix time-machine -C channels -- install foo
>
> is provided in some documentation, say scientific paper.  Where the
> channels.scm file is completly described:
>
> (list (channel
>         (name 'kikoo)
>         (url "https://example.org/that-great.git")
>         (commit
>           "353bdae32f72b720c7ddd706576ccc40e2b43f95")))
>
> In the future, if https://example.org/that-great.git disappears, then
> build/install the package ’foo’ is becoming difficult, nor impossible.
>
> However, let’s consider that the repo ’that-great’ had been saved in SWH
> (say manually); since it is a regular Git repo.  Guix should be able to
> fallback to it transparently.

I went head-down to add SWH fallback to ‘latest-repository-commit’… but
that’s of no use because (guix channels) wants a complete clone so that
it can determine commit relations (to detect downgrades).

The SWH vault gives access to checkouts primarily, but it’s also
possible to get a full repo in ‘git fast-import’ format, which is what
we need:

  https://archive.softwareheritage.org/api/1/vault/revision/gitfast/doc/

However, this API will be eventually replaced by some other solution say
SWH developers, possibly a bare Git repo export, so it may not be a good
idea to build upon it.

If we were able, using the SWH API, to map “revisions” to “origins”, we
could find potential mirrors hosting a given commit, but apparently
that’s not possible.

To be continued…

Ludo’.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 2914 bytes --]

diff --git a/guix/git.scm b/guix/git.scm
index a5103547d3..449011c51a 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -32,6 +32,7 @@
   #:use-module (guix records)
   #:use-module (guix gexp)
   #:use-module (guix sets)
+  #:autoload   (guix swh) (swh-download)
   #:use-module ((guix diagnostics) #:select (leave))
   #:use-module (guix progress)
   #:use-module (rnrs bytevectors)
@@ -459,22 +460,43 @@ Log progress and checkout info to LOG-PORT."
                   (eq? 'regular (stat:type stat))))))
 
   (format log-port "updating checkout of '~a'...~%" url)
-  (let*-values
-      (((checkout commit _)
-        (update-cached-checkout url
-                                #:recursive? recursive?
-                                #:ref ref
-                                #:cache-directory
-                                (url-cache-directory url cache-directory
-                                                     #:recursive?
-                                                     recursive?)
-                                #:log-port log-port))
-       ((name)
-        (url+commit->name url commit)))
-    (format log-port "retrieved commit ~a~%" commit)
-    (values (add-to-store store name #t "sha256" checkout
-                          #:select? (negate dot-git?))
-            commit)))
+
+  (catch 'git-error
+    (lambda ()
+      (let*-values
+          (((checkout commit _)
+            (update-cached-checkout (pk 'l-r-c url)
+                                    #:recursive? recursive?
+                                    #:ref ref
+                                    #:cache-directory
+                                    (url-cache-directory url cache-directory
+                                                         #:recursive?
+                                                         recursive?)
+                                    #:log-port log-port))
+           ((name)
+            (url+commit->name url commit)))
+        (format log-port "retrieved commit ~a~%" commit)
+        (values (add-to-store store name #t "sha256" checkout
+                              #:select? (negate dot-git?))
+                commit)))
+    (lambda (key err . rest)
+      ;; XXX: 'swh-download' currently doesn't support submodules.
+      (when recursive?
+        (apply throw key err rest))
+
+      (pk 'err key err rest)
+      (match ref
+        (('commit . commit)
+         ;; Attempt to fetch COMMIT from SWH.
+         (call-with-temporary-directory
+          (lambda (directory)
+            (unless (swh-download url commit directory)
+              (apply throw key err rest))
+            (values (add-to-store store (url+commit->name url commit)
+                                  #t "sha256" directory)
+                    commit))))
+        (_
+         (apply throw key err rest))))))
 
 (define (print-git-error port key args default-printer)
   (match args

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones
  2021-03-05 14:51 ` Ludovic Courtès
@ 2021-09-10 14:34   ` Ludovic Courtès
  2021-09-10 14:34     ` bug#44187: [PATCH 1/3] swh: Support downloads of bare Git repositories Ludovic Courtès
                       ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-10 14:34 UTC (permalink / raw)
  To: 44187

Hi!

A bit of context: we already had automatic SWH fallback for Git checkouts,
which is to say that any origin that uses ‘git-fetch’ would have its
checkout transparently fetched from SWH if upstream vanished (this
dates back to commit 608d3dca89d73fe7260e97a284a8aeea756a3e11, Nov. 2018).

What this patch series provides is SWH fallback for full Git clones (as
opposed to flat checkouts).  It works for anything that uses (guix git).
That includes <git-checkout>, used by transformation options:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build footswitch --with-git-url=footswitch=http://example.org/sdf --with-commit=footswitch=1eabc563ca5692b3e08d84f1f0e6fd2283284469 -n
updating checkout of 'http://example.org/sdf'...
SWH: found revision 1eabc563ca5692b3e08d84f1f0e6fd2283284469 with directory at 'https://archive.softwareheritage.org/api/1/directory/ad8976564375ee55f645387bbcdf4b66e6582fbf/'
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/HEAD
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/branches/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/config
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/description
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/applypatch-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/commit-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/fsmonitor-watchman.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/post-update.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-applypatch.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-commit.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-push.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-rebase.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-receive.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/prepare-commit-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/update.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/exclude
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/refs
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/info/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/info/packs
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/pack-ed28f44a2599fe2d0a5f1b1a84c247c43afd14a1.idx
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/pack-ed28f44a2599fe2d0a5f1b1a84c247c43afd14a1.pack
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/heads/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/heads/master
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/tags/
retrieved commit 1eabc563ca5692b3e08d84f1f0e6fd2283284469
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bayfront.guix.gnu.org'... 100.0%
The following derivation would be built:
   /gnu/store/39kzsy5kgj5150q6zgckc2hbxp999adw-footswitch-git.1eabc56.drv
--8<---------------cut here---------------end--------------->8---

In the example above, we pass a bogus Git URL, but since the target
commit is known, (guix git) automatically fetches a bare Git repository
from the SWH vault.

It also works for channels, which is what zimoun reported here:

--8<---------------cut here---------------start------------->8---
$ cat /tmp/chan.scm
(list (channel
        (name 'guix)
        (url "https://git.savannah.gnu.org/git/guix.git")
        (commit
          "f91ae9425bb385b60396a544afe27933896b8fa3")
        (introduction
          (make-channel-introduction
            "9edb3f66fd807b096b48283debdcddccfea34bad"
            (openpgp-fingerprint
             "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
      (channel
       (name 'guix-past)
       (url "https://does-not-exist.inria.fr/guix-hpc/guix-past")
       (commit "77e183dc7ade307ad3409fad4b71f12e266de910")
       #;(introduction
        (make-channel-introduction
         "0c119db2ea86a389769f4d2b9c6f5c41c027e336"
         (openpgp-fingerprint
          "3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5")))))
$ ./pre-inst-env guix time-machine -C /tmp/chan.scm -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Updating channel 'guix-past' from Git repository at 'https://does-not-exist.inria.fr/guix-hpc/guix-past'...
SWH: found revision 77e183dc7ade307ad3409fad4b71f12e266de910 with directory at 'https://archive.softwareheritage.org/api/1/directory/7c6aa10e1e0fa54199566145c6a453731872b87d/'
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/HEAD
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/branches/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/config
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/description
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/hooks/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/exclude
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/refs
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/info/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/info/packs
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/pack-e6c0a4813509178eed735708dd60503353a50b9c.idx
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/pack-e6c0a4813509178eed735708dd60503353a50b9c.pack
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/heads/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/heads/master
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/tags/
Computing Guix derivation for 'x86_64-linux'... \  C-c C-c
--8<---------------cut here---------------end--------------->8---

Here, the ‘guix-past’ channel is transparently cloned from SWH.  This
is pretty cool, because having the whole repo around is what permits
things like downgrade prevention¹ and news support².

  Finally we can enjoy content-addressability and brittle URLs
  are becoming a thing of the past!*


Limitations
~~~~~~~~~~~~

Yes, there’s a couple of them.

First, fallback is implemented only for fresh clones, not for updates.
Thus, if I rerun the first example, having now the clone in
~/.cache/guix/checkouts, with a different commit, I get:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build footswitch --with-git-url=footswitch=http://example.org/sdf --with-commit=footswitch=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -n
updating checkout of 'http://example.org/sdf'...
guix build: error: Git failure while fetching http://example.org/sdf: unexpected http status code: 404
--8<---------------cut here---------------end--------------->8---

Second, clones from SWH only contain the one branch that the revision
is on.  For channels, that means that the ‘keyring’ branch is not fetched,
which is why I commented out ‘introduction’ in /tmp/chan.scm above.
If I uncomment it, I get:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix time-machine -C /tmp/chan.scm -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Updating channel 'guix-past' from Git repository at 'https://does-not-exist.inria.fr/guix-hpc/guix-past'...
guix time-machine: error: Git error: cannot locate remote-tracking branch 'origin/keyring'
--8<---------------cut here---------------end--------------->8---

The SWH folks tell me it’ll eventually be possible to map a revision
to its containing snapshot(s) via the HTTP API, and to obtain entire
snapshots (i.e., the repo and all its branches) from the vault.  That’s
what we need to fix this issue.

*Third, and this answers the asterisk above, we must keep in mind that
this is content-addressibility *with SHA1*.  Generating a chosen-prefix
collision is becoming affordable³, so users absolutely need an additional
mechanism to authenticate code they fetched.

For origins, we have the content SHA256, so we’re fine.  For channels,
we have Guix’s authentication mechanism¹, except it’s not available yet
via SWH, as I wrote above.  For the footswitch example above using
‘--with-commit’, we don’t have any authentication method, but in fact,
that’s the situation of Git repositories in general: they can rarely be
authenticated.

Overall, I think it’s a step in the right direction.

Thoughts?

Thanks to vlorentz and olasd on #swh-devel for their support!

Thanks,
Ludo’.

¹ https://guix.gnu.org/en/blog/2020/securing-updates/
² https://guix.gnu.org/en/blog/2019/spreading-the-news/
³ https://sha-mbles.github.io/

Ludovic Courtès (3):
  swh: Support downloads of bare Git repositories.
  git: 'update-cached-checkout' can fall back to SWH when cloning.
  git: 'reference-available?' recognizes 'tag-or-commit'.

 guix/git.scm | 45 +++++++++++++++++++++++++++++++++++++++++++--
 guix/swh.scm | 52 ++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 83 insertions(+), 14 deletions(-)

-- 
2.33.0





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: [PATCH 1/3] swh: Support downloads of bare Git repositories.
  2021-09-10 14:34   ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones Ludovic Courtès
@ 2021-09-10 14:34     ` Ludovic Courtès
  2021-09-17 17:31       ` bug#44187: Channel clones lack SWH fallback zimoun
  2021-09-10 14:34     ` bug#44187: [PATCH 2/3] git: 'update-cached-checkout' can fall back to SWH when cloning Ludovic Courtès
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-10 14:34 UTC (permalink / raw)
  To: 44187; +Cc: Ludovic Courtès

From: Ludovic Courtès <ludovic.courtes@inria.fr>

* guix/swh.scm (swh-download-archive): New procedure.
(swh-download-directory): Rewrite in terms of 'swh-download-archive'.
(swh-download): Add #:archive-type and honor it.  Use
'swh-download-archive' instead of 'swh-download-directory'.
---
 guix/swh.scm | 52 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/guix/swh.scm b/guix/swh.scm
index 3d5d2a410a..707551a799 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -645,20 +645,29 @@ delete it when leaving the dynamic extent of this call."
       (lambda ()
         (false-if-exception (delete-file-recursively tmp-dir))))))
 
-(define* (swh-download-directory id output
-                                 #:key (log-port (current-error-port)))
-  "Download from Software Heritage the directory with the given ID, and
-unpack it to OUTPUT.  Return #t on success and #f on failure"
+(define* (swh-download-archive swhid output
+                               #:key
+                               (archive-type 'flat)
+                               (log-port (current-error-port)))
+  "Download from Software Heritage the directory or revision with the given
+SWID, in the ARCHIVE-TYPE format (one of 'flat or 'git-bare), and unpack it to
+OUTPUT.  Return #t on success and #f on failure."
   (call-with-temporary-directory
    (lambda (directory)
-     (match (vault-fetch id 'directory #:log-port log-port)
+     (match (vault-fetch swhid
+                         #:archive-type archive-type
+                         #:log-port log-port)
        (#f
         (format log-port
-                "SWH: directory ~a could not be fetched from the vault~%"
-                id)
+                "SWH: object ~a could not be fetched from the vault~%"
+                swhid)
         #f)
        ((? port? input)
-        (let ((tar (open-pipe* OPEN_WRITE "tar" "-C" directory "-xzvf" "-")))
+        (let ((tar (open-pipe* OPEN_WRITE "tar" "-C" directory
+                               (match archive-type
+                                 ('flat "-xzvf")     ;gzipped
+                                 ('git-bare "-xvf")) ;uncompressed
+                               "-")))
           (dump-port input tar)
           (close-port input)
           (let ((status (close-pipe tar)))
@@ -672,6 +681,14 @@ unpack it to OUTPUT.  Return #t on success and #f on failure"
                                #:log (%make-void-port "w"))
              #t))))))))
 
+(define* (swh-download-directory id output
+                                 #:key (log-port (current-error-port)))
+  "Download from Software Heritage the directory with the given ID, and
+unpack it to OUTPUT.  Return #t on success and #f on failure."
+  (swh-download-archive (string-append "swh:1:dir:" id) output
+                        #:archive-type 'flat
+                        #:log-port log-port))
+
 (define (commit-id? reference)
   "Return true if REFERENCE is likely a commit ID, false otherwise---e.g., if
 it is a tag name.  This is based on a simple heuristic so use with care!"
@@ -679,8 +696,11 @@ it is a tag name.  This is based on a simple heuristic so use with care!"
        (string-every char-set:hex-digit reference)))
 
 (define* (swh-download url reference output
-                       #:key (log-port (current-error-port)))
-  "Download from Software Heritage a checkout of the Git tag or commit
+                       #:key
+                       (archive-type 'flat)
+                       (log-port (current-error-port)))
+  "Download from Software Heritage a checkout (if ARCHIVE-TYPE is 'flat) or a
+full Git repository (if ARCHIVE-TYPE is 'git-bare) of the Git tag or commit
 REFERENCE originating from URL, and unpack it in OUTPUT.  Return #t on success
 and #f on failure.
 
@@ -694,7 +714,15 @@ wait until it becomes available, which could take several minutes."
      (format log-port "SWH: found revision ~a with directory at '~a'~%"
              (revision-id revision)
              (swh-url (revision-directory-url revision)))
-     (swh-download-directory (revision-directory revision) output
-                             #:log-port log-port))
+     (swh-download-archive (match archive-type
+                             ('flat
+                              (string-append
+                               "swh:1:dir:" (revision-directory revision)))
+                             ('git-bare
+                              (string-append
+                               "swh:1:rev:" (revision-id revision))))
+                           output
+                           #:archive-type archive-type
+                           #:log-port log-port))
     (#f
      #f)))
-- 
2.33.0





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: [PATCH 2/3] git: 'update-cached-checkout' can fall back to SWH when cloning.
  2021-09-10 14:34   ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones Ludovic Courtès
  2021-09-10 14:34     ` bug#44187: [PATCH 1/3] swh: Support downloads of bare Git repositories Ludovic Courtès
@ 2021-09-10 14:34     ` Ludovic Courtès
  2021-09-10 14:34     ` bug#44187: [PATCH 3/3] git: 'reference-available?' recognizes 'tag-or-commit' Ludovic Courtès
  2021-09-13 16:07     ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones zimoun
  3 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-10 14:34 UTC (permalink / raw)
  To: 44187; +Cc: Ludovic Courtès

From: Ludovic Courtès <ludovic.courtes@inria.fr>

Fixes <https://issues.guix.gnu.org/44187>.
Reported by zimoun <zimon.toutoune@gmail.com>.

* guix/git.scm (GITERR_HTTP): New variable.
(clone-from-swh, clone/swh-fallback): New procedures.
(update-cached-checkout): Use 'clone/swh-fallback' instead of 'clone*'.
---
 guix/git.scm | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/guix/git.scm b/guix/git.scm
index acc48fd12f..377e09888a 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -36,6 +36,7 @@
   #:use-module (guix sets)
   #:use-module ((guix diagnostics) #:select (leave))
   #:use-module (guix progress)
+  #:autoload   (guix swh) (swh-download)
   #:use-module (rnrs bytevectors)
   #:use-module (ice-9 format)
   #:use-module (ice-9 match)
@@ -180,6 +181,13 @@ the 'SSL_CERT_FILE' and 'SSL_CERT_DIR' environment variables."
       (lambda args
         (make-fetch-options auth-method)))))
 
+(define GITERR_HTTP
+  ;; Guile-Git <= 0.5.2 lacks this constant.
+  (let ((errors (resolve-interface '(git errors))))
+    (if (module-defined? errors 'GITERR_HTTP)
+        (module-ref errors 'GITERR_HTTP)
+        34)))
+
 (define (clone* url directory)
   "Clone git repository at URL into DIRECTORY.  Upon failure,
 make sure no empty directory is left behind."
@@ -342,6 +350,38 @@ definitely available in REPOSITORY, false otherwise."
     (_
      #f)))
 
+(define (clone-from-swh url tag-or-commit output)
+  "Attempt to clone TAG-OR-COMMIT (a string), which originates from URL, using
+a copy archived at Software Heritage."
+  (call-with-temporary-directory
+   (lambda (bare)
+     (and (swh-download url tag-or-commit bare
+                        #:archive-type 'git-bare)
+          (let ((repository (clone* bare output)))
+            (remote-set-url! repository "origin" url)
+            repository)))))
+
+(define (clone/swh-fallback url ref cache-directory)
+  "Like 'clone', but fallback to Software Heritage if the repository cannot be
+found at URL."
+  (define (inaccessible-url-error? err)
+    (let ((class (git-error-class err))
+          (code  (git-error-code err)))
+      (or (= class GITERR_HTTP)                   ;404 or similar
+          (= class GITERR_NET))))                 ;unknown host, etc.
+
+  (catch 'git-error
+    (lambda ()
+      (clone* url cache-directory))
+    (lambda (key err)
+      (match ref
+        (((or 'commit 'tag-or-commit) . commit)
+         (if (inaccessible-url-error? err)
+             (or (clone-from-swh url commit cache-directory)
+                 (throw key err))
+             (throw key err)))
+        (_ (throw key err))))))
+
 (define cached-checkout-expiration
   ;; Return the expiration time procedure for a cached checkout.
   ;; TODO: Honor $GUIX_GIT_CACHE_EXPIRATION.
@@ -408,7 +448,7 @@ it unchanged."
    (let* ((cache-exists? (openable-repository? cache-directory))
           (repository    (if cache-exists?
                              (repository-open cache-directory)
-                             (clone* url cache-directory))))
+                             (clone/swh-fallback url ref cache-directory))))
      ;; Only fetch remote if it has not been cloned just before.
      (when (and cache-exists?
                 (not (reference-available? repository ref)))
-- 
2.33.0





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: [PATCH 3/3] git: 'reference-available?' recognizes 'tag-or-commit'.
  2021-09-10 14:34   ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones Ludovic Courtès
  2021-09-10 14:34     ` bug#44187: [PATCH 1/3] swh: Support downloads of bare Git repositories Ludovic Courtès
  2021-09-10 14:34     ` bug#44187: [PATCH 2/3] git: 'update-cached-checkout' can fall back to SWH when cloning Ludovic Courtès
@ 2021-09-10 14:34     ` Ludovic Courtès
  2021-09-13 16:07     ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones zimoun
  3 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-10 14:34 UTC (permalink / raw)
  To: 44187

* guix/git.scm (reference-available?): Handle 'tag-or-commit' with a
40-digit hex string.
---
 guix/git.scm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/guix/git.scm b/guix/git.scm
index 377e09888a..33a111b84a 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -36,7 +36,7 @@
   #:use-module (guix sets)
   #:use-module ((guix diagnostics) #:select (leave))
   #:use-module (guix progress)
-  #:autoload   (guix swh) (swh-download)
+  #:autoload   (guix swh) (swh-download commit-id?)
   #:use-module (rnrs bytevectors)
   #:use-module (ice-9 format)
   #:use-module (ice-9 match)
@@ -340,7 +340,8 @@ dynamic extent of EXP."
   "Return true if REF, a reference such as '(commit . \"cabba9e\"), is
 definitely available in REPOSITORY, false otherwise."
   (match ref
-    (('commit . commit)
+    ((or ('commit . commit)
+         ('tag-or-commit . (? commit-id? commit)))
      (let ((len (string-length commit))
            (oid (string->oid commit)))
        (false-if-git-not-found
-- 
2.33.0





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones
  2021-09-10 14:34   ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones Ludovic Courtès
                       ` (2 preceding siblings ...)
  2021-09-10 14:34     ` bug#44187: [PATCH 3/3] git: 'reference-available?' recognizes 'tag-or-commit' Ludovic Courtès
@ 2021-09-13 16:07     ` zimoun
  2021-09-14 13:37       ` Ludovic Courtès
  3 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2021-09-13 16:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 44187

Hi Ludo,

Cool!  However, the patch does not apply on the top of 53f54d4aa2.
That's why the option '--base' of "git format-patch" is really helpful. ;-)

Onto which commit does the patch set apply?  In order to try and review. :-)

Cheers,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones
  2021-09-13 16:07     ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones zimoun
@ 2021-09-14 13:37       ` Ludovic Courtès
  0 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-14 13:37 UTC (permalink / raw)
  To: zimoun; +Cc: 44187

Hello,

zimoun <zimon.toutoune@gmail.com> skribis:

> Cool!  However, the patch does not apply on the top of 53f54d4aa2.
> That's why the option '--base' of "git format-patch" is really helpful. ;-)

Ah!  It should apply on top of ff613c2b68aac539262822490448e637d8f315ba.

If not, I can rebase it and send an updated patch (I’ve been fiddling
with code in this area lately…).

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: Channel clones lack SWH fallback
  2020-10-23 22:17 bug#44187: whishlist: time-machine --channel falls back to SWH zimoun
  2021-03-05 14:51 ` Ludovic Courtès
@ 2021-09-17  8:02 ` zimoun
  2021-09-18 21:10   ` Ludovic Courtès
  1 sibling, 1 reply; 15+ messages in thread
From: zimoun @ 2021-09-17  8:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 44187

Hi,

On ven., 10 sept. 2021 at 16:34, Ludovic Courtès <ludo@gnu.org> wrote:

>   Finally we can enjoy content-addressability and brittle URLs
>   are becoming a thing of the past!*

Yeah, it is awesome!

The original URL of the channel was:
<https://github.com/zimoun/channel-example.git>.  And this channel
defines a package where the upstream has also disappeared
<https://github.com/zimoun/hello-example.git>.  Note the URL in the
package definition is not bogus… but using one was already working. :-)

All is saved on SWH, so now all is transparent!  From my point of view,
this is a killer feature for scientific folks. :-)

--8<---------------cut here---------------start------------->8---
$ cat /tmp/channels.scm
(list (channel
        (name 'guix)
        (url "/home/sitour/src/guix/guix")
        (branch "fix-44187")
        (commit
          "cdea76a2fdaf7705583a02081a6468d436b8df05"))
      (channel
        (name 'example)
        (url "https://example.org/foo.git")
        (commit
          "67c9f2143aa6f545419ae913b4ae02af4cd3effc")))

$ ./pre-inst-env guix time-machine -C /tmp/channels.scm --disable-authentication -- build hi
Updating channel 'guix' from Git repository at '/home/sitour/src/guix/guix'...
guix time-machine: warning: channel authentication disabled
Updating channel 'example' from Git repository at 'https://example.org/foo.git'...
SWH: found revision 67c9f2143aa6f545419ae913b4ae02af4cd3effc with directory at 'https://archive.softwareheritage.org/api/1/directory/fe423e88ce277d3fc230c88d408e42b14a3a458c/'
SWH vault: requested bundle cooking, waiting for completion...
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/HEAD
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/branches/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/config
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/description
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/hooks/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/info/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/info/exclude
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/info/refs
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/info/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/info/packs
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/pack/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/pack/pack-4e9279a1b64e4dda7bd9d84bb6b50bb1f80def08.idx
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/pack/pack-4e9279a1b64e4dda7bd9d84bb6b50bb1f80def08.pack
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/heads/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/heads/master
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/tags/
guix time-machine: warning: channel authentication disabled

[...]

Computing Guix derivation for 'x86_64-linux'... -

[...]

construction de /gnu/store/6g9qlysbbk7p4609xrv82j0wzbib1y4r-git-checkout.drv...
guile: warning: failed to install locale
environment variable `PATH' set to `/gnu/store/378zjf2kgajcfd7mfr98jn5xyc5wa3qv-gzip-1.10/bin:/gnu/store/sf3rbvb6iqcphgm1afbplcs72hsywg25-tar-1.32/bin'
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: 	git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: 	git branch -m <name>
Initialized empty Git repository in /gnu/store/884nsva9r8wkp40kbqyvpj1ad57jc5dd-git-checkout/.git/
fatal: could not read Username for 'https://github.com': No such device or address
Failed to do a shallow fetch; retrying a full fetch...
fatal: could not read Username for 'https://github.com': No such device or address
git-fetch: '/gnu/store/5vai7bfrfkzv22dx13bxpszjrqyi78x6-git-minimal-2.33.0/bin/git fetch origin' failed with exit code 128
Trying content-addressed mirror at berlin.guix.gnu.org...
Trying content-addressed mirror at berlin.guix.gnu.org...
Trying to download from Software Heritage...
SWH: found revision e1eefd033b8a2c4c81babc6fde08ebb116c6abb8 with directory at 'https://archive.softwareheritage.org/api/1/directory/c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/'
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/ABOUT-NLS
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/AUTHORS
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/COPYING

[...]

swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/tests/hello-1
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/tests/last-1
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/tests/traditional-1
construction de /gnu/store/6g9qlysbbk7p4609xrv82j0wzbib1y4r-git-checkout.drv réussie
construction de /gnu/store/jx1r7w8xaw768176pjl0j0q1l1529w75-hi-2.10.drv...
starting phase `set-SOURCE-DATE-EPOCH'
phase `set-SOURCE-DATE-EPOCH' succeeded after 0.0 seconds

[...]

construction de /gnu/store/jx1r7w8xaw768176pjl0j0q1l1529w75-hi-2.10.drv réussie
/gnu/store/jn8d031zx4znxy7s5zhj4dbr6xjsfq9v-hi-2.10
--8<---------------cut here---------------end--------------->8---

Well, it still misses the tarball and non-Git fetch method fallback and
the story will be more than awesome! :-)

> Limitations
> ~~~~~~~~~~~~
>
> Yes, there’s a couple of them.

Well, yes some limitations but not so much. ;-)


> First, fallback is implemented only for fresh clones, not for updates.
> Thus, if I rerun the first example, having now the clone in
> ~/.cache/guix/checkouts, with a different commit, I get:

SWH is not a forge but an archive. :-)  Therefore, this update case does
not make sense to me.  I mean,

--8<---------------cut here---------------start------------->8---
$ git -C ~/.cache/guix/checkouts/6k7wvrcpbdsw3pje5b4squybw3jfn3viyrj7gcl7fipa5yjflaza fetch
fatal: dépôt 'http://example.org/sdf/' non trouvé
--8<---------------cut here---------------end--------------->8---

Well, maybe this cache could be removed if the commit is not found
inside this cache and retry to fetch it from SWH.  Obviously, the
downdate case works.

Note that on fresh clone, the error message could be improved:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build guix --with-git-url=guix=https://example.org --with-commit=guix=ff613c2b68aac539262822490448e637d8f315ba -n
updating checkout of 'https://example.org'...
guix build: error: Git failure while fetching https://example.org: unexpected http status code: 404
--8<---------------cut here---------------end--------------->8---

where https://example.org is bogus and
ff613c2b68aac539262822490448e637d8f315ba is not yet archived on SWH.  It
could be nice to warn in addition to the 404 that it is not found in
SWH.  WDYT?


> Second, clones from SWH only contain the one branch that the revision
> is on.  For channels, that means that the ‘keyring’ branch is not fetched,
> which is why I commented out ‘introduction’ in /tmp/chan.scm above.

To me, it is not an issue.  Because you reach a commit from the past
knowing the hash.

Aside my opinion, I wanted to know which kind of metadata we get back
from the Git repo, so I tried:

--8<---------------cut here---------------start------------->8---
$ guix build guix --with-git-url=guix=https://example.org --with-commit=guix=c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada -n
updating checkout of 'https://example.org'...
SWH: found revision c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada with directory at 'https://archive.softwareheritage.org/api/1/directory/ca2e8a7222b4850c7bea935dff86b9c2a905efd6/'
SWH vault: requested bundle cooking, waiting for completion...
SWH vault: Processing...
[...]
--8<---------------cut here---------------end--------------->8---

then after several hours, I get this:

--8<---------------cut here---------------start------------->8---
SWH vault: failure: Internal Server Error. This incident will be reported.
SWH vault: retrying...
SWH vault: requested bundle cooking, waiting for completion...
SWH vault: Processing...
--8<---------------cut here---------------end--------------->8---

and after more than 12h, the status is still: «SWH vault: Processing...»
and nothing is complete.

About this ’keyring’ branch, somehow it could be as a separated repo, so
why not effectively do it. :-) I mean, get the branch as it is and
mirror this branch in another Git repo saved on SWH; fallback to it if
’keyring’ branch is not there.  I do not know…  Or simply wait that SWH
improves their things. :-)


> *Third, and this answers the asterisk above, we must keep in mind that
> this is content-addressibility *with SHA1*.  Generating a chosen-prefix
> collision is becoming affordable³, so users absolutely need an additional
> mechanism to authenticate code they fetched.
>
> For origins, we have the content SHA256, so we’re fine.  For channels,
> we have Guix’s authentication mechanism¹, except it’s not available yet
> via SWH, as I wrote above.  For the footswitch example above using
> ‘--with-commit’, we don’t have any authentication method, but in fact,
> that’s the situation of Git repositories in general: they can rarely be
> authenticated.

How a chosen-prefix attack could work here?  I understand why the second
preimage attack is an issue.  But I miss how the SHA-1 chosen-prefix attack
could be exploited here to compromise the user, because this hash is provided
by this very same user.


> Ludovic Courtès (3):
>   swh: Support downloads of bare Git repositories.
>   git: 'update-cached-checkout' can fall back to SWH when cloning.
>   git: 'reference-available?' recognizes 'tag-or-commit'.

LGTM!

Cheers,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: Channel clones lack SWH fallback
  2021-09-10 14:34     ` bug#44187: [PATCH 1/3] swh: Support downloads of bare Git repositories Ludovic Courtès
@ 2021-09-17 17:31       ` zimoun
  2021-09-18 10:05         ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2021-09-17 17:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 44187, Ludovic Courtès

Hi Ludo,

The patch LGTM although there is a redundancy, from my understanding.

On Fri, 10 Sep 2021 at 16:34, Ludovic Courtès <ludo@gnu.org> wrote:

> @@ -694,7 +714,15 @@ wait until it becomes available, which could take several minutes."
>       (format log-port "SWH: found revision ~a with directory at '~a'~%"
>               (revision-id revision)
>               (swh-url (revision-directory-url revision)))
> -     (swh-download-directory (revision-directory revision) output
> -                             #:log-port log-port))
> +     (swh-download-archive (match archive-type
> +                             ('flat
> +                              (string-append
> +                               "swh:1:dir:" (revision-directory revision)))
> +                             ('git-bare
> +                              (string-append
> +                               "swh:1:rev:" (revision-id revision))))

Here the ’swid’ depends on the ’archive-type’…

> +                           output
> +                           #:archive-type archive-type

…which is also passed.  Then this is propagated.  For instance,
’swh-download-directory’:

> +(define* (swh-download-directory id output
> +                                 #:key (log-port (current-error-port)))
> +  "Download from Software Heritage the directory with the given ID, and
> +unpack it to OUTPUT.  Return #t on success and #f on failure."
> +  (swh-download-archive (string-append "swh:1:dir:" id) output
> +                        #:archive-type 'flat
> +                        #:log-port log-port))
> +

Does it make sense to pass this ’swhid’ equal to ’swh:1:rev’ with the
’flat’ archive-type?  Another instance is,

> +     (match (vault-fetch swhid
> +                         #:archive-type archive-type
> +                         #:log-port log-port)

and from my understanding, again ’swhid’ depends on ’archive-type’.
Therefore, it prone error.  The best seems to pass ’(archive-type
. swhid)’ and pattern-match on that.  Yeah, it potentially breaks the
public API… but there is no claim about stability (and I am not
convinced this (guix swh) module is used outside Guix :-)).



Cheers,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: Channel clones lack SWH fallback
  2021-09-17 17:31       ` bug#44187: Channel clones lack SWH fallback zimoun
@ 2021-09-18 10:05         ` Ludovic Courtès
  2021-09-18 10:27           ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-18 10:05 UTC (permalink / raw)
  To: zimoun; +Cc: 44187

Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

> The patch LGTM although there is a redundancy, from my understanding.
>
> On Fri, 10 Sep 2021 at 16:34, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> @@ -694,7 +714,15 @@ wait until it becomes available, which could take several minutes."
>>       (format log-port "SWH: found revision ~a with directory at '~a'~%"
>>               (revision-id revision)
>>               (swh-url (revision-directory-url revision)))
>> -     (swh-download-directory (revision-directory revision) output
>> -                             #:log-port log-port))
>> +     (swh-download-archive (match archive-type
>> +                             ('flat
>> +                              (string-append
>> +                               "swh:1:dir:" (revision-directory revision)))
>> +                             ('git-bare
>> +                              (string-append
>> +                               "swh:1:rev:" (revision-id revision))))
>
> Here the ’swid’ depends on the ’archive-type’…
>
>> +                           output
>> +                           #:archive-type archive-type
>
> …which is also passed.  Then this is propagated.  For instance,
> ’swh-download-directory’:
>
>> +(define* (swh-download-directory id output
>> +                                 #:key (log-port (current-error-port)))
>> +  "Download from Software Heritage the directory with the given ID, and
>> +unpack it to OUTPUT.  Return #t on success and #f on failure."
>> +  (swh-download-archive (string-append "swh:1:dir:" id) output
>> +                        #:archive-type 'flat
>> +                        #:log-port log-port))
>> +
>
> Does it make sense to pass this ’swhid’ equal to ’swh:1:rev’ with the
> ’flat’ archive-type?  Another instance is,
>
>> +     (match (vault-fetch swhid
>> +                         #:archive-type archive-type
>> +                         #:log-port log-port)
>
> and from my understanding, again ’swhid’ depends on ’archive-type’.
> Therefore, it prone error.

‘git-bare’ only makes sense for a revision, not a directory, but I
wonder if ‘flat’ can be used for a revision (in which case it’d be
equivalent to getting the corresponding directory)?

I agree there’s some redundancy between directory/revision and
flat/git-bare, but it’s the SWH API that looks like this, so I’d be
tempted to just keep it as is.  Maybe we could ask for guidance on
#swh-devel.

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: Channel clones lack SWH fallback
  2021-09-18 10:05         ` Ludovic Courtès
@ 2021-09-18 10:27           ` zimoun
  0 siblings, 0 replies; 15+ messages in thread
From: zimoun @ 2021-09-18 10:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 44187

Hi,

On Sat, 18 Sept 2021 at 12:05, Ludovic Courtès <ludo@gnu.org> wrote:
> zimoun <zimon.toutoune@gmail.com> skribis:

> > Does it make sense to pass this ’swhid’ equal to ’swh:1:rev’ with the
> > ’flat’ archive-type?  Another instance is,

[...]

> > and from my understanding, again ’swhid’ depends on ’archive-type’.
> > Therefore, it prone error.
>
> ‘git-bare’ only makes sense for a revision, not a directory, but I

So it does not seem possible to form a 'swhid' as "swh:1:dir" and pass
'archive-type' as 'git-bare'.  And conversely with 'swh:1:rev' and
'flat'.  Right?
I have not tried though. :-)
If yes, it means the both arguments 'swhid' and 'archive-type' are
linked so the function should accept only one unifyied argument and
not 2 independent ones.  IMHO.

> wonder if ‘flat’ can be used for a revision (in which case it’d be
> equivalent to getting the corresponding directory)?
>
> I agree there’s some redundancy between directory/revision and
> flat/git-bare, but it’s the SWH API that looks like this, so I’d be
> tempted to just keep it as is.  Maybe we could ask for guidance on
> #swh-devel.

Well, let postpone the refactoring. :-)  However, if it works as I
understand, then the refactoring seems the correct way so I would not
accept a backward compatibility argument. ;-)

Have a nice week-end,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: Channel clones lack SWH fallback
  2021-09-17  8:02 ` bug#44187: Channel clones lack SWH fallback zimoun
@ 2021-09-18 21:10   ` Ludovic Courtès
  2021-09-20  9:27     ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-18 21:10 UTC (permalink / raw)
  To: zimoun; +Cc: 44187-done

Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

> The original URL of the channel was:
> <https://github.com/zimoun/channel-example.git>.  And this channel
> defines a package where the upstream has also disappeared
> <https://github.com/zimoun/hello-example.git>.  Note the URL in the
> package definition is not bogus… but using one was already working. :-)
>
> All is saved on SWH, so now all is transparent!  From my point of view,
> this is a killer feature for scientific folks. :-)

Yay!  Great that you came up with a nice example to test it on!

>> First, fallback is implemented only for fresh clones, not for updates.
>> Thus, if I rerun the first example, having now the clone in
>> ~/.cache/guix/checkouts, with a different commit, I get:
>
> SWH is not a forge but an archive. :-)  Therefore, this update case does
> not make sense to me.  I mean,
>
> $ git -C ~/.cache/guix/checkouts/6k7wvrcpbdsw3pje5b4squybw3jfn3viyrj7gcl7fipa5yjflaza fetch
> fatal: dépôt 'http://example.org/sdf/' non trouvé

Right, that’s a reasonable limitation.

> Well, maybe this cache could be removed if the commit is not found
> inside this cache and retry to fetch it from SWH.  Obviously, the
> downdate case works.

It’s still useful to keep it cached around in case the user is going to
use it several times in a row.

> Note that on fresh clone, the error message could be improved:
>
> $ ./pre-inst-env guix build guix --with-git-url=guix=https://example.org --with-commit=guix=ff613c2b68aac539262822490448e637d8f315ba -n
> updating checkout of 'https://example.org'...
> guix build: error: Git failure while fetching https://example.org: unexpected http status code: 404
>
>
> where https://example.org is bogus and
> ff613c2b68aac539262822490448e637d8f315ba is not yet archived on SWH.  It
> could be nice to warn in addition to the 404 that it is not found in
> SWH.  WDYT?

Agreed; I’ve made this change (actually ‘swh-download’ prints something
upon failure since commit 60b42bec8413aa9844e625fb1903257f1bc1e55c, but
it looks more like a debugging message.)

> $ guix build guix --with-git-url=guix=https://example.org --with-commit=guix=c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada -n
> updating checkout of 'https://example.org'...
> SWH: found revision c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada with directory at 'https://archive.softwareheritage.org/api/1/directory/ca2e8a7222b4850c7bea935dff86b9c2a905efd6/'
> SWH vault: requested bundle cooking, waiting for completion...
> SWH vault: Processing...
> [...]
>
>
> then after several hours, I get this:
>
> SWH vault: failure: Internal Server Error. This incident will be reported.
> SWH vault: retrying...
> SWH vault: requested bundle cooking, waiting for completion...
> SWH vault: Processing...
>
> and after more than 12h, the status is still: «SWH vault: Processing...»
> and nothing is complete.

Did it eventually succeed?  We obviously have no guarantee as to how
long it might take to cook a bundle.

> About this ’keyring’ branch, somehow it could be as a separated repo, so
> why not effectively do it. :-) I mean, get the branch as it is and
> mirror this branch in another Git repo saved on SWH; fallback to it if
> ’keyring’ branch is not there.  I do not know…  Or simply wait that SWH
> improves their things. :-)

Yeah, they’re planning to support it eventually.

>> *Third, and this answers the asterisk above, we must keep in mind that
>> this is content-addressibility *with SHA1*.  Generating a chosen-prefix
>> collision is becoming affordable³, so users absolutely need an additional
>> mechanism to authenticate code they fetched.

[...]

> How a chosen-prefix attack could work here?  I understand why the second
> preimage attack is an issue.  But I miss how the SHA-1 chosen-prefix attack
> could be exploited here to compromise the user, because this hash is provided
> by this very same user.

I think you’re right, it’s rather second-preimage attacks that would be
a serious problem.  My point is: as time passes, assuming that a SHA1
resolves to a single revision on SWH is becoming more and more
questionable.

>>   swh: Support downloads of bare Git repositories.
>>   git: 'update-cached-checkout' can fall back to SWH when cloning.
>>   git: 'reference-available?' recognizes 'tag-or-commit'.

I’ve pushed this after adding the warning as you suggested:

  dce2cf311b * git: 'reference-available?' recognizes 'tag-or-commit'.
  05f44c2d85 * git: 'update-cached-checkout' can fall back to SWH when cloning.
  6ec81c31c0 * swh: Support downloads of bare Git repositories.

Thanks a lot for reviewing and testing on real-world examples!

Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: Channel clones lack SWH fallback
  2021-09-18 21:10   ` Ludovic Courtès
@ 2021-09-20  9:27     ` zimoun
  2021-09-22 10:03       ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2021-09-20  9:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 44187-done

Hi,

On Sat, 18 Sept 2021 at 23:10, Ludovic Courtès <ludo@gnu.org> wrote:
> zimoun <zimon.toutoune@gmail.com> skribis:

> > and after more than 12h, the status is still: «SWH vault: Processing...»
> > and nothing is complete.
>
> Did it eventually succeed?  We obviously have no guarantee as to how
> long it might take to cook a bundle.

No, I stopped.  And I reported to #swh-devel.  It might be something
wrong on their side.
Yeah, cook a bundle could be long... especially with large repo as
Guix (lot of commits and couple of files).
I think it is ok to let the code as it is now.


> >> *Third, and this answers the asterisk above, we must keep in mind that
> >> this is content-addressibility *with SHA1*.  Generating a chosen-prefix
> >> collision is becoming affordable³, so users absolutely need an additional
> >> mechanism to authenticate code they fetched.
>
> [...]
>
> > How a chosen-prefix attack could work here?  I understand why the second
> > preimage attack is an issue.  But I miss how the SHA-1 chosen-prefix attack
> > could be exploited here to compromise the user, because this hash is provided
> > by this very same user.
>
> I think you’re right, it’s rather second-preimage attacks that would be
> a serious problem.  My point is: as time passes, assuming that a SHA1
> resolves to a single revision on SWH is becoming more and more
> questionable.

Well, SHA-1 is 2^160 (~10^48.2) and compared to 10^50 which is the
estimated number of atoms in Earth.  Speaking about
content-addressability, SHA-1 seems fine.  However, for security, yeah
time flies. :-)


> >>   swh: Support downloads of bare Git repositories.
> >>   git: 'update-cached-checkout' can fall back to SWH when cloning.
> >>   git: 'reference-available?' recognizes 'tag-or-commit'.
>
> I’ve pushed this after adding the warning as you suggested:
>
>   dce2cf311b * git: 'reference-available?' recognizes 'tag-or-commit'.
>   05f44c2d85 * git: 'update-cached-checkout' can fall back to SWH when cloning.
>   6ec81c31c0 * swh: Support downloads of bare Git repositories.

Cool!  I would deserve a --news entry. ;-)

Cheers,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#44187: Channel clones lack SWH fallback
  2021-09-20  9:27     ` zimoun
@ 2021-09-22 10:03       ` Ludovic Courtès
  0 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2021-09-22 10:03 UTC (permalink / raw)
  To: zimoun; +Cc: 44187-done

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> On Sat, 18 Sept 2021 at 23:10, Ludovic Courtès <ludo@gnu.org> wrote:

[...]

>> > How a chosen-prefix attack could work here?  I understand why the second
>> > preimage attack is an issue.  But I miss how the SHA-1 chosen-prefix attack
>> > could be exploited here to compromise the user, because this hash is provided
>> > by this very same user.
>>
>> I think you’re right, it’s rather second-preimage attacks that would be
>> a serious problem.  My point is: as time passes, assuming that a SHA1
>> resolves to a single revision on SWH is becoming more and more
>> questionable.
>
> Well, SHA-1 is 2^160 (~10^48.2) and compared to 10^50 which is the
> estimated number of atoms in Earth.  Speaking about
> content-addressability, SHA-1 seems fine.  However, for security, yeah
> time flies. :-)

True!

>> >>   swh: Support downloads of bare Git repositories.
>> >>   git: 'update-cached-checkout' can fall back to SWH when cloning.
>> >>   git: 'reference-available?' recognizes 'tag-or-commit'.
>>
>> I’ve pushed this after adding the warning as you suggested:
>>
>>   dce2cf311b * git: 'reference-available?' recognizes 'tag-or-commit'.
>>   05f44c2d85 * git: 'update-cached-checkout' can fall back to SWH when cloning.
>>   6ec81c31c0 * swh: Support downloads of bare Git repositories.
>
> Cool!  I would deserve a --news entry. ;-)

That’s a good idea, I’ve added one.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-09-22 10:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-23 22:17 bug#44187: whishlist: time-machine --channel falls back to SWH zimoun
2021-03-05 14:51 ` Ludovic Courtès
2021-09-10 14:34   ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones Ludovic Courtès
2021-09-10 14:34     ` bug#44187: [PATCH 1/3] swh: Support downloads of bare Git repositories Ludovic Courtès
2021-09-17 17:31       ` bug#44187: Channel clones lack SWH fallback zimoun
2021-09-18 10:05         ` Ludovic Courtès
2021-09-18 10:27           ` zimoun
2021-09-10 14:34     ` bug#44187: [PATCH 2/3] git: 'update-cached-checkout' can fall back to SWH when cloning Ludovic Courtès
2021-09-10 14:34     ` bug#44187: [PATCH 3/3] git: 'reference-available?' recognizes 'tag-or-commit' Ludovic Courtès
2021-09-13 16:07     ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones zimoun
2021-09-14 13:37       ` Ludovic Courtès
2021-09-17  8:02 ` bug#44187: Channel clones lack SWH fallback zimoun
2021-09-18 21:10   ` Ludovic Courtès
2021-09-20  9:27     ` zimoun
2021-09-22 10:03       ` Ludovic Courtès

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.