From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id CPeeKkK1J2OhvAAAbAwnHQ (envelope-from ) for ; Mon, 19 Sep 2022 02:18:10 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id yIOsKkK1J2NTrgAA9RJhRA (envelope-from ) for ; Mon, 19 Sep 2022 02:18:10 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 4DA991ED50 for ; Mon, 19 Sep 2022 02:18:10 +0200 (CEST) Received: from localhost ([::1]:54672 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oa4UD-0001jm-9L for larch@yhetil.org; Sun, 18 Sep 2022 20:18:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48032) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oa4Tb-0001io-2U for guix-devel@gnu.org; Sun, 18 Sep 2022 20:17:31 -0400 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:58813) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oa4TY-0003Af-1r for guix-devel@gnu.org; Sun, 18 Sep 2022 20:17:30 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 6855D3200A3C; Sun, 18 Sep 2022 20:17:23 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Sun, 18 Sep 2022 20:17:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= philipmcgrath.com; h=cc:cc:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to; s=fm2; t=1663546642; x=1663633042; bh=drzY9C/Ljb HzJuFclP0l7G+dS0bN+wVef77Yt4q8A+o=; b=3oDGu4Ys3AGFkgJ4OeaQKdk17F KdVBDhUqxMUBvB1cWJ243tRbMqo44lvnEV5SupqNZztcrgxae9iGMP95YuydkyYz ec9RIaoWG/aECYbQlmnPrW5fLEzBFI13qXiS7bMBdmnIhYUDKwk1j+k/ppNEBmkz xr6nl/JOpur+BWr2aVBUsV76dkDwf5/IygYbElRPQLwYikIDxtMfbTkWhK9VwS3g NXKRdZxU3AXRAk6+4N+mWoRaQrz3uU/PQwIThE/uW5UlD8L7wdAJKgLLPHGadvyu Dq0AwQ+SgmwTYMEbpt7aEN9gKHAhnUgJcMoWK3txP/dh3z/f8Gt2twroUYOQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:message-id:mime-version :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1663546642; x= 1663633042; bh=drzY9C/LjbHzJuFclP0l7G+dS0bN+wVef77Yt4q8A+o=; b=Z 3IDhM/U52FmT6xo05mehK3xOFz9BG0jA8lx1JpVI8Sv0zbZ6FRS6Xf9nIj6xz1G7 PvprwrP87pcjf5rDeeuSIqb5WOu2ELsh+v5e5JCjsXfp3wCq5iR9se2HfoDb5lpl 0rm3Nsu35yY1yo3ABheJva/IVaO8h00Sv0YNumVNt3mtMlv4YlJgSz6xLvjuKY6x vxSWb+t2sb19oMN6ZnSnyiBISEUkEY7lOI6ZE8FgVhtrYLdKhGOLNn0j0qg9HSyM J2FODgUJlKkadlfem/cXPEHDzaL6pbaDo6iGRJYAzmY/CtVdw/sDJibPgkemn3j3 GnDsROL6BG2Qp3v05e5wA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfedviedgfeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne goufhushhpvggtthffohhmrghinhculdegledmnecujfgurhephffvvefufffkgggtsehg tderredttdejnecuhfhrohhmpefrhhhilhhiphcuofgtifhrrghthhcuoehphhhilhhiph esphhhihhlihhpmhgtghhrrghthhdrtghomheqnecuggftrfgrthhtvghrnhepgeefgeeh gfeitdevveevjedufefhleelieekiefhheevgeevteelledvkeekgeeinecuffhomhgrih hnpehgihhthhhusgdrihhopdhgnhhurdhorhhgpdhgohhoghhlvghsohhurhgtvgdrtgho mhdpghhithhhuhgsrdgtohhmpdhophgvnhhgrhhouhhprdhorhhgpdhmrghnjedrohhrgh enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehphhhi lhhiphesphhhihhlihhpmhgtghhrrghthhdrtghomh X-ME-Proxy: Feedback-ID: i2b1146f3:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 18 Sep 2022 20:17:21 -0400 (EDT) From: Philip McGrath To: guix Cc: Maxime Devos , Liliana Marie Prikler , Liliana Marie Prikler Subject: What 'sh' should 'system' use? Date: Sun, 18 Sep 2022 20:13:44 -0400 Message-ID: <2284386.8hzESeGDPO@bastet> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2658274.1BCLMh4Saa"; micalg="pgp-sha512"; protocol="application/pgp-signature" Received-SPF: pass client-ip=64.147.123.19; envelope-from=philip@philipmcgrath.com; helo=wout3-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1663546690; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=drzY9C/LjbHzJuFclP0l7G+dS0bN+wVef77Yt4q8A+o=; b=k9nbSGmlYgPANMqHFkDhUr7f91iNkeA/dBHa9rhhTg/f/D8WHspW2gIS2BeOrfpdwPYFwS M7hoeg9kjKj5T5fVximEAAtaCOPJE/4RXWkFqzjT7njZVi2jo6rql+F1SUP9TvCFTzHIDY BLhnq/0L7vuD2UH0IQRYQjYsjW5KYih4MdvT8xtQDvYCYB6/+FUrQK+U2Rrb4A3zB4mddb z8hMiNaTTdqJ1q082w9XYWKwn68l36/xNmWh9ghZP8Y3uTtf4FOjKO6ONQPLyP38oNXLsG 8tdn/8a5y0I0425RMTtnyZ93OjMOtIWzi9Ehvmm8ubc1X33/hQxLf8OZeJynNQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1663546690; a=rsa-sha256; cv=none; b=Xkj7VAEBISS+2C4+CgyYqSyOf+6fOYHZYV5SCcDID62QHo1Hx61yeYFn+p6i9d5PWfsGWC eCy++zyfhkNy4gVV9btW1X3FZP7hdioCJY5YsbjBCFQs0g7tN7d1Kpe9yRrwqNkVoa8UFd 8FUnukulD5806lQ/6cZfCBbhWoxcj0wrt5wuHMuMygXRCdohHUSKYZwb0124q27GP+Q7vX n4O4Eeg0XBt/Mxf0d9/7NVIFoFvlGpAmU+3iPCA0D/ks/Z+5k4vw35DRHDHXb2XuaY2TRI 9HKZhTRYDQ2MxWKOsz/HQLoPiY4TjQetRn1/WeBukPHx3ST+p8FWq7ulxDHk8A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=philipmcgrath.com header.s=fm2 header.b=3oDGu4Ys; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b="Z 3IDhM/"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 2.16 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=philipmcgrath.com header.s=fm2 header.b=3oDGu4Ys; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b="Z 3IDhM/"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 4DA991ED50 X-Spam-Score: 2.16 X-Migadu-Scanner: scn1.migadu.com X-TUID: Od3+2gCE0bLZ --nextPart2658274.1BCLMh4Saa Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8"; protected-headers="v1" From: Philip McGrath To: guix Subject: What 'sh' should 'system' use? Date: Sun, 18 Sep 2022 20:13:44 -0400 Message-ID: <2284386.8hzESeGDPO@bastet> Hi Guix, The C standard library includes a function 'system' to run a string as a sh= ell=20 command. Other languages provide similar functions, including Guile and man= y=20 other Schemes and the Standard ML Basis Library.[1][2] Even without a=20 dedicated library function, a program might want to run 'sh' using a genera= l- purpose mechanism for launching subprocesses. How should the implementation of a function like 'system' find an 'sh'=20 executable? This question came up most recently with the patch series updating to Racke= t=20 8.6 [3]: we already had patched in a workaround for Racket, but Chez Scheme= =20 and Zuo also needed workarounds to find 'sh'. I'm going to try to summarize= the=20 context I found in the course of that discussion and explain my current=20 thinking, which changed over the course of that thread. I think Guix should decide on an approach for functions like 'system' that = can=20 be applied consistently across languages. In particular, I don't think what= =20 our 'glibc' package is currently doing makes sense under any sort of approa= ch. =46irst, an overview of three ways of trying to answer this question: 1) Many programs assume 'sh' can be found at '/bin/sh', but this is not tru= e=20 in Guix build environments, and it is not portable in general. Some systems= =20 have historically had a non-POSIX shell at '/bin/sh' and a POSIX shell at '/ usr/xpg4/bin/sh'. More significantly, on Android, the shell is never at '/bin/sh'! (It is=20 typically at '/system/bin/sh', except it is '/vendor/bin/sh' for "vendor=20 code", whatever that is.[4]) That may be relevant for upstreams' willingnes= s=20 to reconsider this assumption. In recent Python, the implementation of 'subprocess.Popen' uses [5]: unix_shell =3D ('/system/bin/sh' if hasattr(sys, 'getandroidapilevel') else '/bin/sh') which accounts for Android while being completely non-general. On the other hand, even Nix puts '/bin/sh' at its usual path: we are really= =20 quite an outlier in this respect. (IIUC, Nix also has '/usr/bin/env', but n= o=20 other utilities at FHS paths.) In Glibc, 'sysdeps/posix/system.c' assumes '/bin/sh', and Guix currently=20 patches it to refer to a store path as a string constant (but see below for= =20 issues). 2) There is a non-standard but ubiquitous macro '_PATH_BSHELL' in 'paths.h'= =20 which is supposed to give the path of a Bourne-like shell. In Guix, we patc= h=20 this to refer to a store path as a string constant (but again, see below fo= r=20 issues). Notablty, on Android, it is not a compile-time constant: it is #define _PATH_BSHELL __bionic_get_shell_path() where the function returns '/system/bin/sh' or '/vendor/bin/sh' as appropri= ate=20 (but, in any case, it returns a `const char*` to a compile-time constant, s= o=20 no manual memory management is needed). 3) POSIX actually has an answer to the question of how to find 'sh', but,=20 unfortunately, its answer doesn't work in Guix build environments. The POSIX spec for 'system' [6] says, in the informative section "Applicati= on=20 Usage": > There is no defined way for an application to find the specific path for = the > shell. However, confstr() can provide a value for PATH that is guaranteed > to find the sh utility. Likewise, the spec for 'sh' [7] says in the corresponding section: > Applications should note that the standard PATH to the shell cannot be > assumed to be either /bin/sh or /usr/bin/sh, and should be determined by > interrogation of the PATH returned by getconf PATH, ensuring that the > returned pathname is an absolute pathname and not a shell built-in. Most emphatically, the spec for 'confstr' [8] says in the normative section= =20 "Description": > If the implementation supports the POSIX shell option, the string stored = in > buf after a call to: >=20 > confstr(_CS_PATH, buf, sizeof(buf)) >=20 > can be used as a value of the PATH environment variable that accesses all= of > the standard utilities of POSIX.1-2017, that are provided in a manner > accessible via the exec family of functions, if the return value is less > than or equal to sizeof(buf). It's worth noting here that 'PATH' is explicitly not consulted. Likewise, f= rom=20 the rationale section of [6]: > One reviewer suggested that an implementation of system() might want to u= se > an environment variable such as SHELL to determine which command > interpreter to use. The supposed implementation would use the default > command interpreter if the one specified by the environment variable was > not available. This would allow a user, when using an application that > prompts for command lines to be processed using system(), to specify a > different command interpreter. Such an implementation is discouraged. If > the alternate command interpreter did not follow the command line syntax > specified in the Shell and Utilities volume of POSIX.1-2017, then changing > SHELL would render system() non-conforming. This would affect applications > that expected the specified behavior from system(), and since the Shell a= nd > Utilities volume of POSIX.1-2017 does not mention that SHELL affects > system(), the application would not know that it needed to unset SHELL. It seems that 'confstr' is supposed to access "configuration-defined string= =20 values", i.e. with possible configuration applied runtime, in contrast to t= he=20 compile-time 'CS_PATH' (without an underscore). Unfortunately, AFAICT, Glibc's 'confstr' implementation for '_CS_PATH' does= n't=20 have any mechanism for configuring the search path: it simply returns the=20 compile-time constant, which is: #define CS_PATH "/bin:/usr/bin" and neither of those directories exist in Guix build environments. So, with that context in mind, what are the problems with Guix's Glibc, and= =20 with existing solutions more generally? =46irst, a test program I tried in [9] seemed to indicate that '_PATH_BSHEL= L'=20 refered to 'bash-static', but 'system("echo $BASH")' referred to 'bash- minimal'. It's possible that my test gave an incorrect answer: I just tried= =20 'guix size glibc' (I hadn't thought of that earlier), and it doesn't list a= =20 reference to 'bash-minimal'. But, if we are embedding references in libc to= =20 two different copies of Bash, that seems clearly bad. More broadly, I now think it would be better in we embedded zero references= to=20 copies of Bash in libc. I have changed my mind on this before, and I could be persuaded otherwise.= =20 When I wrote the Racket patch for '/bin/sh' that had been in place before t= he=20 latest change, I initially was going to use a hard-coded Bash only when '/b= in/ sh' did not exist, but the discussion persuaded me it would make more sense= to=20 always use the 'sh' from the package's inputs.[10] For Racket, a dependency= on=20 'sh' didn't seem too unreasonable. However, giving every program using Glibc a hard dependency on Bash=E2=80= =94and on a=20 particular Bash executable=E2=80=94seems like a much bigger imposition. I now think it would be better to find 'sh' dynamically at run time rather = than=20 embed a reference to a specific shell at compile time. When 'sh' is needed,= it=20 can be provided by a build system or as an explicit input. When 'sh' isn't= =20 needed and perhaps isn't wanted, we should be able to create environments a= nd=20 programs without it, without libc pulling it along. I found this note from the Linux man-pages project [11] interesting in that= =20 regard: > In versions of glibc before 2.1.3, the check for the availability > of /bin/sh was not actually performed if command was NULL; > instead it was always assumed to be available, and system() > always returned 1 in this case. Since glibc 2.1.3, this check is > performed because, even though POSIX.1-2001 requires a conforming > implementation to provide a shell, that shell may not be > available or executable if the calling program has previously > called chroot(2) (which is not specified by POSIX.1-2001). =46inally, some possible courses of action: 1) If we want to continue to hard-code a specific shell into Glibc, I think= we=20 should document the decision (for example, why 'bash-static' vs. 'bash- minimal'?) and recommendations for how packages should use it: '_PATH_BSHEL= L'=20 is the best mechanism I've heard of so far, though I wish it were=20 standardized, and the fact that it can't be portably assumed to be a string= =20 constant could be surprising. 2) If we want to make 'sh' a weak/dynamic reference, I think we should=20 strongly consider arranging to make it available at '/bin/sh' when present.= I=20 expect this option would require less patching of other packages *by far* t= han=20 any other approach. 3) If we want a dynamic 'sh' not located at '/bin/sh', I think we should=20 implement a function similar to '__bionic_get_shell_path()' and use it for= =20 '_PATH_BSHELL', 'system', etc. That begs the question of how the function=20 should find 'sh', and I don't have an answer for that. In principle, we cou= ld=20 design a configuration mechanism for 'confstr(_CS_PATH, buf, sizeof(buf))' = and=20 use it to find the shell: that has some appeal, but making the mechanism=20 extensible enough to support "all of the standard utilities of POSIX.1-2017= "=20 seems like a challenge. What do you think? =2DPhilip [1]: https://smlfamily.github.io/Basis/os-process.html#SIG:OS_PROCESS.syste= m:VAL [2]: https://lists.gnu.org/archive/html/help-guix/2021-11/msg00036.html [3]: https://issues.guix.gnu.org/57050 [4]: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/ __bionic_get_shell_path.cpp [5]: https://github.com/python/cpython/blob/ 8184f0fce3b734413e3d3a282f1425d3cb8507fd/Lib/subprocess.py#L1760-L1762 [6]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html [7]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html [8]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html [9]: https://issues.guix.gnu.org/57050#63 [10]: https://issues.guix.gnu.org/47180 [11]: https://man7.org/linux/man-pages/man3/system.3.html --nextPart2658274.1BCLMh4Saa Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE9GWrrNY3rqwUFVXPygNjjfo/HHoFAmMntDgACgkQygNjjfo/ HHr5nQ//U2VpKBTb1YKfYqVXzSqeIadzVcKhAtE/RryxVrk+/3nFggH2puIAASQm 3AFY2+9SxvMr+8+toKCeU2OfzqTPsTW1dnJGdptuCgRNmy65KSK3xaHllZnKwq5W Q8mBJBXfMFfIAXsh5nUm7SjFRjYrXjunKp55izzrPZYkl5Uc96XDvCqKz7knc2NX ZLyGG41pSmq2/eIFIMIo8o2mt5jKEGjXIPgSBbvsndPeeVmPafk9PoBl+S1cC/WZ GKgZ4PhpBpTXII0mdICVxmWEU9eDARFsKIGEe9elhJybXAc8RKz7W5wC14OhbtgK zCs9hCndIFroGC7Pp4Gdtsr9F5dTJjMvMtPkqvkM9iAUJpJ/aWTRvHojDhcR1YZO 1aRwefl7TT///8w3CeWnM1b37uw6obCRFWDBFTR7i88VsrtAL3rv7KU+wl6Yytzx yQWmtwZbXAroPyzxzYdd87/0uswayrPHR33pdL9JeuQwCq5o5Uxdk2lGgxQs89+1 J+Bq/tcmVLLQ2H/KvycfITTraTZibr4RNQ0WK/4fV/Ngkqsal2omGtlFgLWzjgpJ w0nIoGd6lkCaok9hKVgAY/uwhsqyRpfsXQh0injiQoXrWTwBcwvB+IATG3L87i4i 75YVCJ88FkIuVOtE6wKzMamBuPcNaMCA0kee5qtAsw/Jh8LQvL8= =wetb -----END PGP SIGNATURE----- --nextPart2658274.1BCLMh4Saa--