From: zimoun <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 42162@debbugs.gnu.org, "Maurice Brémond" <Maurice.Bremond@inria.fr>
Subject: bug#42162: Recovering source tarballs
Date: Mon, 20 Jul 2020 17:52:09 +0200 [thread overview]
Message-ID: <CAJ3okZ0iMNjv93MM1FkEB3_zXA48Rq3rKXhwwug85fNRRc41Mg@mail.gmail.com> (raw)
In-Reply-To: <87365mzil1.fsf@gnu.org>
Hi,
On Mon, 20 Jul 2020 at 10:39, Ludovic Courtès <ludo@gnu.org> wrote:
> zimoun <zimon.toutoune@gmail.com> skribis:
> > On Sat, 11 Jul 2020 at 17:50, Ludovic Courtès <ludo@gnu.org> wrote:
> There are many many comments in your message, so I took the liberty to
> reply only to the essence of it. :-)
Many comments because many open topics. ;-)
> However, the two examples above are good ideas as to the way forward: we
> could start a url-fetch-to-git-fetch migration in these two cases, and
> perhaps more.
Well, to be honest, I have tried to probe such migration when I opened
this thread:
https://lists.gnu.org/archive/html/guix-devel/2020-05/msg00224.html
and I have tried to summarized the pros/cons arguments here:
https://lists.gnu.org/archive/html/guix-devel/2020-05/msg00448.html
> > What about in addition push to IPFS? Feasible? Lookup issue?
>
> Lookup issue. :-) The hash in a CID is not just a raw blob hash.
> Files are typically chunked beforehand, assembled as a Merkle tree, and
> the CID is roughly the hash to the tree root. So it would seem we can’t
> use IPFS as-is for tarballs.
Using the Git-repo map/table, then it becomes an option, right?
Well, SWH would be a backend and IPFS could be another one. Or any
"cloudy" storage system that could appear in the future, right?
> >> • If we no longer deal with tarballs but upstreams keep signing
> >> tarballs (not raw directory hashes), how can we authenticate our
> >> code after the fact?
> >
> > Does Guix automatically authenticate code using signed tarballs?
>
> Not automatically; packagers are supposed to authenticate code when they
> add a package (‘guix refresh -u’ does that automatically).
So I miss the point of having this authentication information in the
future where upstream has disappeared.
The authentication is done at packaging time. So once it is done,
merged into master and then pushed to SWH, being able to authenticate
again does not really matter.
And if it matters, all should be updated each time vulnerabilities are
discovered and so I am not sure SWH makes sense for this use-case.
> But today, we store tarball hashes, not directory hashes.
We store what "guix hash" returns. ;-)
So it is easy to migrate from tarball hashes to whatever else. :-)
I mean, it is "(sha256 (base32" and it is easy to have also
"(sha256-tree (base32" or something like that.
In the case where the integrity is also used as lookup key.
> > The format of metadata (disassemble) that you propose is schemish
> > (obviously! :-)) but we could propose something more JSON-like.
>
> Sure, if that helps get other people on-board, why not (though sexps
> have lived much longer than JSON and XML together :-)).
Lived much longer and still less less less used than JSON or XML alone. ;-)
I have not done yet the clear back-to-envelop computations. Roughly,
there are ~23 commits on average per day updating packages, so say 70%
of them are url-fetch, it is ~16 new tarballs per day, on average.
How the model using a Git-repo will scale? Because, naively the
output of "disassemble-archive" in full text (pretty-print format) for
the hello-2.10.tar is 120KB and so 16*365*120K = ~700Mb per year
without considering all the Git internals. Obviously, it depends on
the number of files and I do not know if hello is a representative
example.
And I do not know how Git operates on binary files if the disassembled
tarball is stored as .go file, or any other.
All the best,
simon
ps:
Just if someone wants to check from where I estimate the numbers.
--8<---------------cut here---------------start------------->8---
for ci in $(git log --after=v1.0.0 --oneline \
| grep "gnu:" | grep -E "(Add|Update)" \
| cut -f1 -d' ')
do
git --no-pager log -1 $ci --format="%cs"
done | uniq -c > /tmp/commits
guix environment --ad-hoc r-minimal \
-- R -e 'summary(read.table("/tmp/commits"))'
gzip -dc < $(guix build -S hello) > /tmp/hello.tar
guix repl -L /tmp/tar/
scheme@(guix-user)> (call-with-input-file "hello.tar"
(lambda (port)
(disassemble-archive port)))
--8<---------------cut here---------------end--------------->8---
next prev parent reply other threads:[~2020-07-20 15:53 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-02 7:29 bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 Ludovic Courtès
2020-07-02 8:50 ` zimoun
2020-07-02 10:03 ` Ludovic Courtès
2020-07-11 15:50 ` bug#42162: Recovering source tarballs Ludovic Courtès
2020-07-13 19:20 ` Christopher Baines
2020-07-20 21:27 ` zimoun
2020-07-15 16:55 ` zimoun
2020-07-20 8:39 ` Ludovic Courtès
2020-07-20 15:52 ` zimoun [this message]
2020-07-20 17:05 ` Dr. Arne Babenhauserheide
2020-07-20 19:59 ` zimoun
2020-07-21 21:22 ` Ludovic Courtès
2020-07-22 0:27 ` zimoun
2020-07-22 10:28 ` Ludovic Courtès
2020-08-03 21:10 ` Ricardo Wurmus
2020-07-30 17:36 ` Timothy Sample
2020-07-31 14:41 ` Ludovic Courtès
2020-08-03 16:59 ` Timothy Sample
2020-08-05 17:14 ` Ludovic Courtès
2020-08-05 18:57 ` Timothy Sample
2020-08-23 16:21 ` Ludovic Courtès
2020-11-03 14:26 ` Ludovic Courtès
2020-11-03 16:37 ` zimoun
2020-11-03 19:20 ` Timothy Sample
2020-11-04 16:49 ` Ludovic Courtès
2022-09-29 0:32 ` bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 Maxim Cournoyer
2022-09-29 10:56 ` zimoun
2022-09-29 15:00 ` Ludovic Courtès
2022-09-30 3:10 ` Maxim Cournoyer
2022-09-30 12:13 ` zimoun
2022-10-01 22:04 ` Ludovic Courtès
2022-10-03 15:20 ` Maxim Cournoyer
2022-10-04 21:26 ` Ludovic Courtès
2022-09-30 18:17 ` Maxime Devos
2020-08-26 10:04 ` bug#42162: Recovering source tarballs zimoun
2020-08-26 21:11 ` Timothy Sample
2020-08-27 9:41 ` zimoun
2020-08-27 12:49 ` Ludovic Courtès
2020-08-27 18:06 ` Bengt Richter
2021-01-10 19:32 ` bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 Maxim Cournoyer
2021-01-13 10:39 ` Ludovic Courtès
2021-01-13 12:27 ` Andreas Enge
2021-01-13 15:07 ` Andreas Enge
[not found] ` <handler.42162.D42162.16105343699609.notifdone@debbugs.gnu.org>
2021-01-13 14:28 ` Ludovic Courtès
2021-01-14 14:21 ` Maxim Cournoyer
2021-10-04 15:59 ` bug#42162: gforge.inria.fr is off-line Ludovic Courtès
2021-10-04 17:50 ` bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 zimoun
2021-10-07 16:07 ` Ludovic Courtès
2021-10-09 17:29 ` raingloom
2021-10-11 8:41 ` zimoun
2021-10-12 9:24 ` Ludovic Courtès
2021-10-12 10:50 ` zimoun
2021-10-12 16:04 ` Substitute retention Ludovic Courtès
2021-10-12 18:06 ` zimoun
2021-10-15 9:27 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJ3okZ0iMNjv93MM1FkEB3_zXA48Rq3rKXhwwug85fNRRc41Mg@mail.gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=42162@debbugs.gnu.org \
--cc=Maurice.Bremond@inria.fr \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.