Hello everybody,

I know for sure that Guix maintainers and developers are working on
this, I'm just asking to find some time to inform and possibly discuss
with users (also in guix-devel) on what measures GNU Guix - the software
distribution - can/should deploy to try to avoid this kind of attacks.

Please consider that this (sub)thread is _not_ specific to xz-utils but
to the specific attack vector (matrix?) used to inject a backdoor in a
binary during a build phase, in a _very_ stealthy way.

Also, since Guix _is_ downstream, I'd like this (sub)thread to
concentrate on what *Guix* can/should do to strenghten the build process
/independently/ of what upstreams (or other distributions) can/should
do.

First of all, I understand the xz backdoor attack was complex (both
socially and technically) and all the details are still under scrutiny,
but AFAIU the way the backdoor has been injected by "infecting" the
**build phase** of the software (and obfuscating the payload in
binaries) is very alarming and is something all distributions aiming at
reproducible builds must (and they actually _are_) examine(ing) very
well.

John Kehayias <john.kehayias@protonmail.com> writes:

[...]

> On Fri, Mar 29, 2024 at 01:39 PM, Felix Lechner via Reports of security issues in Guix itself and in packages provided by Guix wrote:
>
>> Hi Ryan,
>>
>> On Fri, Mar 29 2024, Ryan Prior wrote:

[...]

>>> Guix currently packages xz-utils 5.2.8 as "xz" using the upstream
>>> tarball. [...] Should we switch from using upstream tarballs to some
>>> fork with more responsible maintainers?
>>
>> Guix's habit of building from tarballs is a poor idea because tarballs
>> often differ.

First of all: is to be considered reproducible a software that produces
different binaries if compiled from the source code repository (git or
something else managed) or from the official released source tarball?

My first thought is no.

>> For example, maintainers may choose to ship a ./configure script that
>> is otherwise not present in Git (although a configure.ac might be).
>> Guix should build from Git.

Two useful pointers explaining how the backdoor has been injected are
[1] (general workflow) and [2] (payload obfuscation)

The first and *indispensable* condition for the attack to be succesful
is this:

--8<---------------cut here---------------start------------->8---

* The release tarballs upstream publishes don't have the same code that
 GitHub has. This is common in C projects so that downstream consumers
 don't need to remember how to run autotools and autoconf. The version
 of build-to-host.m4 in the release tarballs differs wildly from the
 upstream on GitHub.

[...]

* Explain dist tarballs, why we use them, what they do, link to
  autotools docs, etc

 * "Explaining the history of it would be very helpful I think. It also
 explains how a single person was able to insert code in an open source
 project that no one was able to peer review. It is pragmatically
 impossible, even if technically possible once you know the problem is
 there, to peer review a tarball prepared in this manner."

--8<---------------cut here---------------end--------------->8---
(from [1])

Let me highlight this: «It is pragmatically impossible [...] to peer
review a tarball prepared in this manner.»

There is no doubt that the release tarball is a very weak "trusted
source" (trusted by peer review, not by authority) than the upstream
DVCS repository.

It's *very* noteworthy that the backdoor was discovered thanks to a
performance issue and _not_ during a peer review of the source
code... the _build_ code *is* source code, no?

It's not the first time a source release tarball of free software is
compromised [3], but the way the compromise worked in this case is
something new (or at least never spetted before, right?).

> We discussed a bit on #guix today about this. A movement to sourcing
> more directly from Git in general has been discussed before, though
> has some hurdles.

Please could someone knowledgeable about the details describe what are
the hurdles about sourcing from DVCS (eventually other than git)?

> I will let someone more knowledgeable about the details chime in, but
> yes, something we should do.

I'm definitely _not_ the knowledgeable one, but I'd like to share the
result of my researches.

Is it possible to enhance our build-system(s) (e.g. gnu-build-system) so
thay can /ignore/ pre-built .m4 or similar script and rebuild them
during the build process?

Richard W.M. Jones on fedora-devel ML proposed [4]:

--8<---------------cut here---------------start------------->8---

(1) We should routinely delete autoconf-generated cruft from upstream
projects and regenerate it in %prep. It is easier to study the real
source rather than dig through the convoluted, generated shell script in
an upstream './configure' looking for back doors. For most projects,
just running "autoreconf - fiv" is enough.

--8<---------------cut here---------------end--------------->8---

There is an interesting bug report [5] about autoreconf:

--8<---------------cut here---------------start------------->8---

While analyzing the recent xz backdoor hook into the build system [A],
I noticed that one of the aspects why the hook worked was because it
seems like «autoreconf -f -i» (that is run in Debian as part of
dh-autoreconf via dh) still seems to take the serial into account,
which was bumped in the tampered .m4 file. If either the gettext.m4
had gotten downgraded (to the version currently in Debian, which would
not have pulled the tampered build-to-host.m4), or once Debian upgrades
gettext, the build-to-host.m4 would get downgraded to the upstream
clean version, then the hook would have been disabled and the backdoor
would be inert. (Of course at that point the malicious actor would
have found another way to hook into the build system, but the less
avenues there are the better.)

I've tried to search the list and checked for old bug reports on the
debbugs.gnu.org site, but didn't notice anything. To me this looks like
a very unexpected behavior, but it's not clear whether this is intentional
or a bug. In any case regardless of either position, it would be good to
improve this (either by fixing --force to force things even if
downgrading, or otherwise perhaps to add a new option to really force
everything).

--8<---------------cut here---------------end--------------->8---

So AFAIU using a fixed "autoreconf -fi" should mitigate the risks of
tampered .m4 macros (and other possibly tampered build configuration
script)?

IMHO "ignoring" (deleting) pre-built build scripts in Guix
build-system(s) should be considered... or is /already/ so?

Also, I found this thread [6] interesting, especially this message [7]
from Jacob Bachmeyer:

--8<---------------cut here---------------start------------->8---

The *user* could catch issues like this backdoor, since the backdoor
appears (based on what I have read so far) to materialize certain object
files while configure is running, while `find . -iname '*.o'` /should/
return nothing before make is run. This also suggests that running "make
clean" after configure would kill at least this backdoor.

--8<---------------cut here---------------end--------------->8---

Something to apply in Guix gnu-build-system?

He also writes:

--8<---------------cut here---------------start------------->8---

A *very* observant (unreasonably so) user might notice that "make" did
not build the objects that the backdoor provided.

--8<---------------cut here---------------end--------------->8---

Is there a way to enhance gnu-build-system in order to make it notice
that some object was not build by make?

He then goes on explaining:

--8<---------------cut here---------------start------------->8---

Of course, an attacker could sneak around this as well by moving the
process for unpacking the backdoor object to a Makefile rule, but that
is more likely to "stick out" to an observant user, as well as being an
easy target for automated analysis ("Which files have 'special' rules?")
since you cannot obfuscate those from make(1) and expect them to still
work.

--8<---------------cut here---------------end--------------->8---

Given the above observation that «it is pragmatically impossible [...]
to peer review a tarball prepared in this manner», I strongly doubt that
a possible Makefile tampering _in_the_release_tarball_ is easy to peer
review; I'd ask: is it feaseable such an "automated analysis" (see
above) in a dedicated build-system phase?

Anyway I'm asking myself: a *possibly different from the official code
in a DVCS* release tarball with a *valid* GPG signature (please see [3])
would have been really peer reviewed or is it «pragmatically
impossible»?

In other words: what if the backdoor was injected directly in the source
code of the *official* release tarball signed with a valid GPG signature
(and obviously with a valid sha256 hash)?

Do upstream developer communities peer review release tarballs or they
"just" peer review the code in the official DVCS?

Also, in (info "(guix) origin Reference") I see that Guix packages can have a
list of uri(s) for the origin of source code, see xz as an example [7]:
are they intended to be multiple independent sources to be compared in
order to prevent possible tampering or are they "just" alternatives to
be used if the first listed uri is unavailable?

If the case is the first, a solution would be to specify multiple
independent release tarballs for each package, so that it would be
harder to copromise two release sources, but that is not something under
Guix control.

All in all: should we really avoid the "pragmatically impossible to be
peer reviewed" release tarballs?

WDYT?

Happy hacking! Gio'

[...]

[1] https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27
«FAQ on the xz-utils backdoor (CVE-2024-3094)» (costantly updated)

[2] https://gynvael.coldwind.pl/?lang=en&id=782
«xz/liblzma: Bash-stage Obfuscation Explained»

[3]
e.g. https://web.archive.org/web/20110708023004/http://www.h-online.com/open/news/item/Vsftpd-backdoor-discovered-in-source-code-update-1272310.html
«Vsftpd backdoor discovered in source code - update»
"a bad tarball had been downloaded from the vsftpd master site with an
invalid GPG signature"

[4]
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/YWMNOEJ34Q7QLBWQAB5TM6A2SVJFU4RV/
«Three steps we could take to make supply chain attacks a bit harder»

[5] https://lists.gnu.org/archive/html/bug-autoconf/2024-03/msg00000.html

[6] https://lists.gnu.org/archive/html/automake/2024-03/msg00007.html
«GNU Coding Standards, automake, and the recent xz-utils backdoor»

[7]
https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/compression.scm#n494
--8<---------------cut here---------------start------------->8---
(define-public xz
  (package
   (name "xz")
   (version "5.2.8")
   (source (origin
            (method url-fetch)
            (uri (list (string-append "http://tukaani.org/xz/xz-" version
                                      ".tar.gz")
                       (string-append "http://multiprecision.org/guix/xz-"
                                      version ".tar.gz")))
--8<---------------cut here---------------end--------------->8---



P.S.: in a way, I see this kind of attack is exploiting a form of
statefulness of the build system, in this case "build-to-host.m4" was
/status/; I think that (also) build systems should be stateless and Guix
is doing a great job to reach this goal.

-- 
Giovanni Biscuolo

Xelera IT Infrastructures