Hello, a couple of additional (IMO) useful resources... Giovanni Biscuolo writes: [...] > Let me highlight this: «It is pragmatically impossible [...] to peer > review a tarball prepared in this manner.» > > There is no doubt that the release tarball is a very weak "trusted > source" (trusted by peer review, not by authority) than the upstream > DVCS repository. This kind of attack was described by Daniel Stenberg in his «HOWTO backdoor curl» article in 2021.03.30 as "skip-git-altogether" method: https://daniel.haxx.se/blog/2021/03/30/howto-backdoor-curl/ --8<---------------cut here---------------start------------->8--- The skip-git-altogether methods As I’ve described above, it is really hard even for a skilled developer to write a backdoor and have that landed in the curl git repository and stick there for longer than just a very brief period. If the attacker instead can just sneak the code directly into a release archive then it won’t appear in git, it won’t get tested and it won’t get easily noticed by team members! curl release tarballs are made by me, locally on my machine. After I’ve built the tarballs I sign them with my GPG key and upload them to the curl.se origin server for the world to download. (Web users don’t actually hit my server when downloading curl. The user visible web site and downloads are hosted by Fastly servers.) An attacker that would infect my release scripts (which btw are also in the git repository) or do something to my machine could get something into the tarball and then have me sign it and then create the “perfect backdoor” that isn’t detectable in git and requires someone to diff the release with git in order to detect – which usually isn’t done by anyone that I know of. [...] I of course do my best to maintain proper login sanitation, updated operating systems and use of safe passwords and encrypted communications everywhere. But I’m also a human so I’m bound to do occasional mistakes. Another way could be for the attacker to breach the origin download server and replace one of the tarballs there with an infected version, and hope that people skip verifying the signature when they download it or otherwise notice that the tarball has been modified. I do my best at maintaining server security to keep that risk to a minimum. Most people download the latest release, and then it’s enough if a subset checks the signature for the attack to get revealed sooner rather than later. --8<---------------cut here---------------end--------------->8--- Unfortunately Stenberg in that section misses one attack vector he mentioned in a previous article section named "The tricking a user method": --8<---------------cut here---------------start------------->8--- We can even include more forced “convincing” such as direct threats against persons or their families: “push this code or else…”. This way of course cannot be protected against using 2fa, better passwords or things like that. --8<---------------cut here---------------end--------------->8--- ...and an attack vector involving more subltle ways (let's call it distributed social engineering) to convince the upstream developer and other contributors and/or third parties they need a project co-maintainer authorized to publish _official_ release tarballs. Following Stenberg's attacks classification, since the supply-chain attack was intended to install a backdoor in the _sshd_ service, and _not_ in xz-utils or liblzma, we can classify this attack as: skip-git-altogether to install a backdoor further-down-the-chain, precisely in a _dependency_ of the attacked one, durind a period of "weakness" of the upstream maintainers Stenberg closes his article with this update and one related reply to a comment: --8<---------------cut here---------------start------------->8--- Dependencies Added after the initial post. Lots of people have mentioned that curl can get built with many dependencies and maybe one of those would be an easier or better target. Maybe they are, but they are products of their own individual projects and an attack on those projects/products would not be an attack on curl or backdoor in curl by my way of looking at it. In the curl project we ship the source code for curl and libcurl and the users, the ones that builds the binaries from that source code will get the dependencies too. [...] Jean Hominal says: April 1, 2021 at 14:04 I think the big difference why you “missed” dependencies as an attack vector is because today, most application developers ship their dependencies in their application binaries (by linking statically or shipping a container) – in such a case, I would definitely count an attack on such a dependency, that is then shipped as part of the project’s artifacts, as a successful attack on the project. However, as you only ship a source artifact – of course, dependencies *are* out of scope in your case. Daniel Stenberg says: April 1, 2021 at 15:05 Jean: Right. I don’t want to dismiss the risk or the danger of an attack to a curl dependency. However, it is not possible for me or the curl project to keep them safe! --8<---------------cut here---------------end--------------->8--- That lets a number of open questions about some developers attitude towards _distributing_ their software, but it's off-topic here IMO. Anyway, let me highlight, again, the "pragmatically impossible peer review of release tarballs" argument; Stenberg says: «the “perfect backdoor” that isn’t detectable in git and requires someone to diff the release with git in order to detect – which usually isn’t done by anyone that I know of.» [...] > Is it possible to enhance our build-system(s) (e.g. gnu-build-system) so > thay can /ignore/ pre-built .m4 or similar script and rebuild them > during the build process? There is a related security issue for PHP [1], with an interesting thread on the php.internals mailing list (via externals.io [2]): --8<---------------cut here---------------start------------->8--- Consider removing autogenerated files from tarballs [...] I believe that it would be a good idea to remove the huge attack surface offered by the pre-generated autoconf build scripts and lexers, offered in the release tarballs. [...] this injection mode makes sense, as extra files in the tarball not present in the git repo would raise suspicions, but machine-generated configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS are the norm, and are usually not checked before execution. [...] Specifically in the case of PHP, along from the configure script, the tarball also bundles generated lexer files which contain actual C code, which is an additional attack vector [...] To prevent attacks from malevolent/compromised RMs, I propose completely removing all autogenerated files from the release tarballs, and ensuring their content exactly matches the content of the associated git tag [...] Of course this means that users will have to generate the build scripts when compiling PHP, as when installing PHP from the VCS repo. [...] Distros like arch linux already re-generate the configure scripts from scratch, but I believe that no distinction should be made, everyone should get a tarball containing only the bare source code, without leaving to the user the choice to re-generate the build files, or use a potentially compromised build script. [...] The current standard way of distributing generated configure files in tarballs is precisely what allowed the xz supply chain attack to go unnoticed for so long. I strongly believe all projects using autotools, including PHP, should switch away from this "standard" way of doing things. [...] when a user downloads a source tarball, there's a false sense of security rooted in the mistaken belief that the source code in the tarball matches the one distributed in the VCS, but in reality, the tarball also contains potentially malicious semi-compiled blobs, not present in the VCS. --8<---------------cut here---------------end--------------->8--- Are really "configure scripts containing hundreds of thousands of lines of code not present in the upstream VCS" the norm? If so, can we consider hundreds of thousand of lines of configure scripts and other (auto)generated files bundled in release tarballs "pragmatically impossible" to be peer reviewed? Can we consider that artifacts as sort-of-binary and "force" our build-systems to _regenerate_ *all* them? ...or is it better to completely avoid release tarballs as our sources uris? [...] Thanks, Gio' [1] https://github.com/php/php-src/issues/13838 [2] https://externals.io/message/122811 -- Giovanni Biscuolo Xelera IT Infrastructures