Hello everybody, I know for sure that Guix maintainers and developers are working on this, I'm just asking to find some time to inform and possibly discuss with users (also in guix-devel) on what measures GNU Guix - the software distribution - can/should deploy to try to avoid this kind of attacks. Please consider that this (sub)thread is _not_ specific to xz-utils but to the specific attack vector (matrix?) used to inject a backdoor in a binary during a build phase, in a _very_ stealthy way. Also, since Guix _is_ downstream, I'd like this (sub)thread to concentrate on what *Guix* can/should do to strenghten the build process /independently/ of what upstreams (or other distributions) can/should do. First of all, I understand the xz backdoor attack was complex (both socially and technically) and all the details are still under scrutiny, but AFAIU the way the backdoor has been injected by "infecting" the **build phase** of the software (and obfuscating the payload in binaries) is very alarming and is something all distributions aiming at reproducible builds must (and they actually _are_) examine(ing) very well. John Kehayias writes: [...] > On Fri, Mar 29, 2024 at 01:39 PM, Felix Lechner via Reports of security issues in Guix itself and in packages provided by Guix wrote: > >> Hi Ryan, >> >> On Fri, Mar 29 2024, Ryan Prior wrote: [...] >>> Guix currently packages xz-utils 5.2.8 as "xz" using the upstream >>> tarball. [...] Should we switch from using upstream tarballs to some >>> fork with more responsible maintainers? >> >> Guix's habit of building from tarballs is a poor idea because tarballs >> often differ. First of all: is to be considered reproducible a software that produces different binaries if compiled from the source code repository (git or something else managed) or from the official released source tarball? My first thought is no. >> For example, maintainers may choose to ship a ./configure script that >> is otherwise not present in Git (although a configure.ac might be). >> Guix should build from Git. Two useful pointers explaining how the backdoor has been injected are [1] (general workflow) and [2] (payload obfuscation) The first and *indispensable* condition for the attack to be succesful is this: --8<---------------cut here---------------start------------->8--- * The release tarballs upstream publishes don't have the same code that GitHub has. This is common in C projects so that downstream consumers don't need to remember how to run autotools and autoconf. The version of build-to-host.m4 in the release tarballs differs wildly from the upstream on GitHub. [...] * Explain dist tarballs, why we use them, what they do, link to autotools docs, etc * "Explaining the history of it would be very helpful I think. It also explains how a single person was able to insert code in an open source project that no one was able to peer review. It is pragmatically impossible, even if technically possible once you know the problem is there, to peer review a tarball prepared in this manner." --8<---------------cut here---------------end--------------->8--- (from [1]) Let me highlight this: «It is pragmatically impossible [...] to peer review a tarball prepared in this manner.» There is no doubt that the release tarball is a very weak "trusted source" (trusted by peer review, not by authority) than the upstream DVCS repository. It's *very* noteworthy that the backdoor was discovered thanks to a performance issue and _not_ during a peer review of the source code... the _build_ code *is* source code, no? It's not the first time a source release tarball of free software is compromised [3], but the way the compromise worked in this case is something new (or at least never spetted before, right?). > We discussed a bit on #guix today about this. A movement to sourcing > more directly from Git in general has been discussed before, though > has some hurdles. Please could someone knowledgeable about the details describe what are the hurdles about sourcing from DVCS (eventually other than git)? > I will let someone more knowledgeable about the details chime in, but > yes, something we should do. I'm definitely _not_ the knowledgeable one, but I'd like to share the result of my researches. Is it possible to enhance our build-system(s) (e.g. gnu-build-system) so thay can /ignore/ pre-built .m4 or similar script and rebuild them during the build process? Richard W.M. Jones on fedora-devel ML proposed [4]: --8<---------------cut here---------------start------------->8--- (1) We should routinely delete autoconf-generated cruft from upstream projects and regenerate it in %prep. It is easier to study the real source rather than dig through the convoluted, generated shell script in an upstream './configure' looking for back doors. For most projects, just running "autoreconf - fiv" is enough. --8<---------------cut here---------------end--------------->8--- There is an interesting bug report [5] about autoreconf: --8<---------------cut here---------------start------------->8--- While analyzing the recent xz backdoor hook into the build system [A], I noticed that one of the aspects why the hook worked was because it seems like «autoreconf -f -i» (that is run in Debian as part of dh-autoreconf via dh) still seems to take the serial into account, which was bumped in the tampered .m4 file. If either the gettext.m4 had gotten downgraded (to the version currently in Debian, which would not have pulled the tampered build-to-host.m4), or once Debian upgrades gettext, the build-to-host.m4 would get downgraded to the upstream clean version, then the hook would have been disabled and the backdoor would be inert. (Of course at that point the malicious actor would have found another way to hook into the build system, but the less avenues there are the better.) I've tried to search the list and checked for old bug reports on the debbugs.gnu.org site, but didn't notice anything. To me this looks like a very unexpected behavior, but it's not clear whether this is intentional or a bug. In any case regardless of either position, it would be good to improve this (either by fixing --force to force things even if downgrading, or otherwise perhaps to add a new option to really force everything). --8<---------------cut here---------------end--------------->8--- So AFAIU using a fixed "autoreconf -fi" should mitigate the risks of tampered .m4 macros (and other possibly tampered build configuration script)? IMHO "ignoring" (deleting) pre-built build scripts in Guix build-system(s) should be considered... or is /already/ so? Also, I found this thread [6] interesting, especially this message [7] from Jacob Bachmeyer: --8<---------------cut here---------------start------------->8--- The *user* could catch issues like this backdoor, since the backdoor appears (based on what I have read so far) to materialize certain object files while configure is running, while `find . -iname '*.o'` /should/ return nothing before make is run. This also suggests that running "make clean" after configure would kill at least this backdoor. --8<---------------cut here---------------end--------------->8--- Something to apply in Guix gnu-build-system? He also writes: --8<---------------cut here---------------start------------->8--- A *very* observant (unreasonably so) user might notice that "make" did not build the objects that the backdoor provided. --8<---------------cut here---------------end--------------->8--- Is there a way to enhance gnu-build-system in order to make it notice that some object was not build by make? He then goes on explaining: --8<---------------cut here---------------start------------->8--- Of course, an attacker could sneak around this as well by moving the process for unpacking the backdoor object to a Makefile rule, but that is more likely to "stick out" to an observant user, as well as being an easy target for automated analysis ("Which files have 'special' rules?") since you cannot obfuscate those from make(1) and expect them to still work. --8<---------------cut here---------------end--------------->8--- Given the above observation that «it is pragmatically impossible [...] to peer review a tarball prepared in this manner», I strongly doubt that a possible Makefile tampering _in_the_release_tarball_ is easy to peer review; I'd ask: is it feaseable such an "automated analysis" (see above) in a dedicated build-system phase? Anyway I'm asking myself: a *possibly different from the official code in a DVCS* release tarball with a *valid* GPG signature (please see [3]) would have been really peer reviewed or is it «pragmatically impossible»? In other words: what if the backdoor was injected directly in the source code of the *official* release tarball signed with a valid GPG signature (and obviously with a valid sha256 hash)? Do upstream developer communities peer review release tarballs or they "just" peer review the code in the official DVCS? Also, in (info "(guix) origin Reference") I see that Guix packages can have a list of uri(s) for the origin of source code, see xz as an example [7]: are they intended to be multiple independent sources to be compared in order to prevent possible tampering or are they "just" alternatives to be used if the first listed uri is unavailable? If the case is the first, a solution would be to specify multiple independent release tarballs for each package, so that it would be harder to copromise two release sources, but that is not something under Guix control. All in all: should we really avoid the "pragmatically impossible to be peer reviewed" release tarballs? WDYT? Happy hacking! Gio' [...] [1] https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27 «FAQ on the xz-utils backdoor (CVE-2024-3094)» (costantly updated) [2] https://gynvael.coldwind.pl/?lang=en&id=782 «xz/liblzma: Bash-stage Obfuscation Explained» [3] e.g. https://web.archive.org/web/20110708023004/http://www.h-online.com/open/news/item/Vsftpd-backdoor-discovered-in-source-code-update-1272310.html «Vsftpd backdoor discovered in source code - update» "a bad tarball had been downloaded from the vsftpd master site with an invalid GPG signature" [4] https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/YWMNOEJ34Q7QLBWQAB5TM6A2SVJFU4RV/ «Three steps we could take to make supply chain attacks a bit harder» [5] https://lists.gnu.org/archive/html/bug-autoconf/2024-03/msg00000.html [6] https://lists.gnu.org/archive/html/automake/2024-03/msg00007.html «GNU Coding Standards, automake, and the recent xz-utils backdoor» [7] https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/compression.scm#n494 --8<---------------cut here---------------start------------->8--- (define-public xz (package (name "xz") (version "5.2.8") (source (origin (method url-fetch) (uri (list (string-append "http://tukaani.org/xz/xz-" version ".tar.gz") (string-append "http://multiprecision.org/guix/xz-" version ".tar.gz"))) --8<---------------cut here---------------end--------------->8--- P.S.: in a way, I see this kind of attack is exploiting a form of statefulness of the build system, in this case "build-to-host.m4" was /status/; I think that (also) build systems should be stateless and Guix is doing a great job to reach this goal. -- Giovanni Biscuolo Xelera IT Infrastructures