all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Simon Tournier <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>, "Ekaitz Zarraga" <ekaitz@elenq.tech>
Cc: Attila Lendvai <attila@lendvai.name>,
	Giovanni Biscuolo <g@xelera.eu>, Guix Devel <guix-devel@gnu.org>
Subject: 3 kinds of bootstrap (was Re: backdoor injection via release tarballs combined with binary artifacts)
Date: Tue, 07 May 2024 20:22:22 +0200	[thread overview]
Message-ID: <87msp11nz5.fsf@gmail.com> (raw)
In-Reply-To: <87wmp5l3r3.fsf@gnu.org>

Hi,

I am late to the party…


On mer., 10 avril 2024 at 15:57, Ludovic Courtès <ludo@gnu.org> wrote:

>> That has happened to me too.
>> Why not use Git directly always?
>
> Because it create{s,d} a bootstrapping issue.  The
> “builtin:git-download” method was added only recently to guix-daemon and
> cannot be assumed to be available yet:
>
>   https://issues.guix.gnu.org/65866

[...]

> I think we should gradually move to building everything from
> source—i.e., fetching code from VCS and adding Autoconf & co. as inputs.
>
> This has been suggested several times before.  The difficulty, as you
> point out, will lie in addressing bootstrapping issues with core
> packages: glibc, GCC, Binutils, Coreutils, etc.  I’m not sure how to do
> that but…

[...]

> … live-bootstrap can probably be a good source of inspiration to find a
> way to build those core packages (or some of them) straight from a VCS
> checkout.

IMHO, we need to distinguish because there is different types of issues
and thus different potential workarounds. :-)

  1. Bootstrap how to download source code.
  2. Bootstrap how to build core packages.
  3. Bootstrap the driver (say guix-daemon and helpers).

Well, having solutions for #1 and #3 would naturally provide a solution
for #2.  Although the devil is about details. ;-)


About #1
========

You cannot use the binary ’git’ in order to download the source code of
Git to build the binary ’git’.  Yeah, circular dependency. :-)
Therefore, Git source code is pulled using another method, say from
tarball, such method which also needs to be built from source, so it
also needs yet another method.  The usual chicken-or-the-egg problem.

The current workaround is to “hide” the problem and introduce a
“builtin:download” method: it’s an “opaque” binary that is hard to
inspect.  Roughly, the workaround had been introduced by [1] on
Oct. 2016.  Almost 8 years ago, so it works! :-)

The argument for accepting this “opaque” method is because it is a
fixed-output derivation.  Other said, we know beforehand the SHA256
checksum.  Thus the claim is: being “opaque” does not matter because the
SH256 checksum can be computed independently and all the source code can
be audited.

For cutting another cycle, another “opaque” had be introduced:
“builtin:git-download”.  All applies similarly.

Do not take me wrong with “opaque”.  I mean that the method depends on
the couple user-revision and daemon-revision.  Other said, it is not
straightforward to know when Alice and Bob are using the exact same
method for downloading source code.  Since it is not fully transparent,
it is “opaque”. :-)

Somehow we are applying to all what we need for cutting a specific
circular dependency.  We have some packages named ’foo-bootstrap’ that
are aimed to solve some dependency problem about packages, then we do
not use them for all; we just use them for cutting a circular
dependency.  I think a similar strategy should be applied for the fetch
methods.

We could have “git-fetch” relying on the initial Git method, i.e., a
transparent derivation where it’s straightforward to audit all: the
dependencies and the builder.

And for some specific cases, we could have “git-fetch/bootstrap” relying
on “builtin:git-download”.  It eases to know which packages are very
important to care.

I think that “builtin:download” and “builtin:git-download” applied to
all “url-fetch” and “git-fetch” both downgrade the complete transparency
level for solving very specific bootstrapping problem.

Last about #1, please note that the transparency does not come for free
and has drawbacks: when running say “guix time-machine -C past.scm --
build -S”, all the dependencies for downloading would be the ones of
past.scm.  Other said, for downloading today the source code of a 5
years old package, say using ’hg-fetch’, we need Python and Mercurial as
they were 5 years ago – when we do not expect any difference on the
content with the Python and Mercurial of today.


About #3
========

That’s the very hard topic!  The bootstrapping story is not fully done
yet.

Assuming trust for #1, the bootstrap of Guix starts with
’bootstrap-seeds’, roughly 232KiB.  Take a moment, that’s impressive, :-)
right?

Obviously, I let aside Haskell, Ocaml@5 etc.

Well, diving further.  These 232K alone are not enough.  It also
requires helpers: tar (1.3MiB), bash (1.3MiB), mkdir (0.7MiB) and xz
(0.844MiB).

More, it requires two drivers: static Guile binary (14MiB) and
guix-daemon.

You get it: How to trust these helpers?  Two approaches: (a) implement
something directly in hex/assembler and/or (b) exploit the Guile binary
(à la Scheme on bare metal).

About guix-daemon, one solution is a daemon directly in Guile, and
compatible with the very Guile binary.  Or at least, a minimalist daemon
with just enough features for building up to guix-daemon.

Or another option is the “Extreme bootstrapping” [3] – my understanding
of live-bootstrap.  Somehow, remove guix-daemon from the picture and
convert the derivation – the one read by guix-daemon – to a minimal
Guile script that would be executed during startup.  See the
proof-of-concept in the branch wip-system-bootstrap [4].


Just my lengthy opinion… Or maybe some ideas for GSoC. ;-)


1: https://issues.guix.gnu.org/22774#3
2: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down
3: https://guix.gnu.org/en/blog/2019/reproducible-builds-summit-5th-edition
4: https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-system-bootstrap


Cheers,
simon


  parent reply	other threads:[~2024-05-07 18:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-29 20:57 Backdoor in upstream xz-utils John Kehayias
2024-03-29 17:51 ` Ryan Prior
2024-03-29 20:39   ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-29 20:55     ` Tomas Volf
2024-03-30 21:02       ` Ricardo Wurmus
2024-04-04 10:34   ` backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils) Giovanni Biscuolo
2024-04-04 15:12     ` Attila Lendvai
2024-04-04 16:47       ` Giovanni Biscuolo
2024-04-04 15:47     ` Giovanni Biscuolo
2024-04-04 19:48       ` Attila Lendvai
2024-04-04 20:32         ` Ekaitz Zarraga
2024-04-10 13:57           ` Ludovic Courtès
2024-04-11 12:43             ` Andreas Enge
2024-04-11 12:56               ` Ekaitz Zarraga
2024-04-11 13:49                 ` Andreas Enge
2024-04-11 14:05                   ` Ekaitz Zarraga
2024-04-13  0:14                   ` Skyler Ferris
2024-04-19 14:31                     ` Ludovic Courtès
2024-04-13  6:50                   ` Giovanni Biscuolo
2024-04-13 10:26                     ` Skyler Ferris
2024-04-13 12:47                       ` Giovanni Biscuolo
2024-04-14 16:22                         ` Skyler Ferris
2024-04-12 13:09               ` Attila Lendvai
2024-04-12 20:42               ` Ludovic Courtès
2024-04-13  6:13             ` Giovanni Biscuolo
2024-05-07 18:22             ` Simon Tournier [this message]
2024-04-05 10:13         ` Giovanni Biscuolo
2024-04-05 14:51           ` Attila Lendvai
2024-04-13  7:42             ` Giovanni Biscuolo
2024-04-04 23:03     ` Ricardo Wurmus
2024-04-05  7:06       ` Giovanni Biscuolo
2024-04-05  7:39         ` Ricardo Wurmus
2024-04-05 16:52     ` Jan Wielkiewicz
2024-03-31 15:04 ` Backdoor in upstream xz-utils Rostislav Svoboda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87msp11nz5.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=attila@lendvai.name \
    --cc=ekaitz@elenq.tech \
    --cc=g@xelera.eu \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.