From: Philip McGrath <philip@philipmcgrath.com>
To: guix <guix-devel@gnu.org>
Cc: Maxime Devos <maximedevos@telenet.be>,
Liliana Marie Prikler <liliana.prikler@gmail.com>,
Liliana Marie Prikler <liliana.prikler@ist.tugraz.at>
Subject: What 'sh' should 'system' use?
Date: Sun, 18 Sep 2022 20:13:44 -0400 [thread overview]
Message-ID: <2284386.8hzESeGDPO@bastet> (raw)
[-- Attachment #1: Type: text/plain, Size: 10168 bytes --]
Hi Guix,
The C standard library includes a function 'system' to run a string as a shell
command. Other languages provide similar functions, including Guile and many
other Schemes and the Standard ML Basis Library.[1][2] Even without a
dedicated library function, a program might want to run 'sh' using a general-
purpose mechanism for launching subprocesses.
How should the implementation of a function like 'system' find an 'sh'
executable?
This question came up most recently with the patch series updating to Racket
8.6 [3]: we already had patched in a workaround for Racket, but Chez Scheme
and Zuo also needed workarounds to find 'sh'. I'm going to try to summarize the
context I found in the course of that discussion and explain my current
thinking, which changed over the course of that thread.
I think Guix should decide on an approach for functions like 'system' that can
be applied consistently across languages. In particular, I don't think what
our 'glibc' package is currently doing makes sense under any sort of approach.
First, an overview of three ways of trying to answer this question:
1) Many programs assume 'sh' can be found at '/bin/sh', but this is not true
in Guix build environments, and it is not portable in general. Some systems
have historically had a non-POSIX shell at '/bin/sh' and a POSIX shell at '/
usr/xpg4/bin/sh'.
More significantly, on Android, the shell is never at '/bin/sh'! (It is
typically at '/system/bin/sh', except it is '/vendor/bin/sh' for "vendor
code", whatever that is.[4]) That may be relevant for upstreams' willingness
to reconsider this assumption.
In recent Python, the implementation of 'subprocess.Popen' uses [5]:
unix_shell = ('/system/bin/sh' if
hasattr(sys, 'getandroidapilevel') else '/bin/sh')
which accounts for Android while being completely non-general.
On the other hand, even Nix puts '/bin/sh' at its usual path: we are really
quite an outlier in this respect. (IIUC, Nix also has '/usr/bin/env', but no
other utilities at FHS paths.)
In Glibc, 'sysdeps/posix/system.c' assumes '/bin/sh', and Guix currently
patches it to refer to a store path as a string constant (but see below for
issues).
2) There is a non-standard but ubiquitous macro '_PATH_BSHELL' in 'paths.h'
which is supposed to give the path of a Bourne-like shell. In Guix, we patch
this to refer to a store path as a string constant (but again, see below for
issues). Notablty, on Android, it is not a compile-time constant: it is
#define _PATH_BSHELL __bionic_get_shell_path()
where the function returns '/system/bin/sh' or '/vendor/bin/sh' as appropriate
(but, in any case, it returns a `const char*` to a compile-time constant, so
no manual memory management is needed).
3) POSIX actually has an answer to the question of how to find 'sh', but,
unfortunately, its answer doesn't work in Guix build environments.
The POSIX spec for 'system' [6] says, in the informative section "Application
Usage":
> There is no defined way for an application to find the specific path for the
> shell. However, confstr() can provide a value for PATH that is guaranteed
> to find the sh utility.
Likewise, the spec for 'sh' [7] says in the corresponding section:
> Applications should note that the standard PATH to the shell cannot be
> assumed to be either /bin/sh or /usr/bin/sh, and should be determined by
> interrogation of the PATH returned by getconf PATH, ensuring that the
> returned pathname is an absolute pathname and not a shell built-in.
Most emphatically, the spec for 'confstr' [8] says in the normative section
"Description":
> If the implementation supports the POSIX shell option, the string stored in
> buf after a call to:
>
> confstr(_CS_PATH, buf, sizeof(buf))
>
> can be used as a value of the PATH environment variable that accesses all of
> the standard utilities of POSIX.1-2017, that are provided in a manner
> accessible via the exec family of functions, if the return value is less
> than or equal to sizeof(buf).
It's worth noting here that 'PATH' is explicitly not consulted. Likewise, from
the rationale section of [6]:
> One reviewer suggested that an implementation of system() might want to use
> an environment variable such as SHELL to determine which command
> interpreter to use. The supposed implementation would use the default
> command interpreter if the one specified by the environment variable was
> not available. This would allow a user, when using an application that
> prompts for command lines to be processed using system(), to specify a
> different command interpreter. Such an implementation is discouraged. If
> the alternate command interpreter did not follow the command line syntax
> specified in the Shell and Utilities volume of POSIX.1-2017, then changing
> SHELL would render system() non-conforming. This would affect applications
> that expected the specified behavior from system(), and since the Shell and
> Utilities volume of POSIX.1-2017 does not mention that SHELL affects
> system(), the application would not know that it needed to unset SHELL.
It seems that 'confstr' is supposed to access "configuration-defined string
values", i.e. with possible configuration applied runtime, in contrast to the
compile-time 'CS_PATH' (without an underscore).
Unfortunately, AFAICT, Glibc's 'confstr' implementation for '_CS_PATH' doesn't
have any mechanism for configuring the search path: it simply returns the
compile-time constant, which is:
#define CS_PATH "/bin:/usr/bin"
and neither of those directories exist in Guix build environments.
So, with that context in mind, what are the problems with Guix's Glibc, and
with existing solutions more generally?
First, a test program I tried in [9] seemed to indicate that '_PATH_BSHELL'
refered to 'bash-static', but 'system("echo $BASH")' referred to 'bash-
minimal'. It's possible that my test gave an incorrect answer: I just tried
'guix size glibc' (I hadn't thought of that earlier), and it doesn't list a
reference to 'bash-minimal'. But, if we are embedding references in libc to
two different copies of Bash, that seems clearly bad.
More broadly, I now think it would be better in we embedded zero references to
copies of Bash in libc.
I have changed my mind on this before, and I could be persuaded otherwise.
When I wrote the Racket patch for '/bin/sh' that had been in place before the
latest change, I initially was going to use a hard-coded Bash only when '/bin/
sh' did not exist, but the discussion persuaded me it would make more sense to
always use the 'sh' from the package's inputs.[10] For Racket, a dependency on
'sh' didn't seem too unreasonable.
However, giving every program using Glibc a hard dependency on Bash—and on a
particular Bash executable—seems like a much bigger imposition.
I now think it would be better to find 'sh' dynamically at run time rather than
embed a reference to a specific shell at compile time. When 'sh' is needed, it
can be provided by a build system or as an explicit input. When 'sh' isn't
needed and perhaps isn't wanted, we should be able to create environments and
programs without it, without libc pulling it along.
I found this note from the Linux man-pages project [11] interesting in that
regard:
> In versions of glibc before 2.1.3, the check for the availability
> of /bin/sh was not actually performed if command was NULL;
> instead it was always assumed to be available, and system()
> always returned 1 in this case. Since glibc 2.1.3, this check is
> performed because, even though POSIX.1-2001 requires a conforming
> implementation to provide a shell, that shell may not be
> available or executable if the calling program has previously
> called chroot(2) (which is not specified by POSIX.1-2001).
Finally, some possible courses of action:
1) If we want to continue to hard-code a specific shell into Glibc, I think we
should document the decision (for example, why 'bash-static' vs. 'bash-
minimal'?) and recommendations for how packages should use it: '_PATH_BSHELL'
is the best mechanism I've heard of so far, though I wish it were
standardized, and the fact that it can't be portably assumed to be a string
constant could be surprising.
2) If we want to make 'sh' a weak/dynamic reference, I think we should
strongly consider arranging to make it available at '/bin/sh' when present. I
expect this option would require less patching of other packages *by far* than
any other approach.
3) If we want a dynamic 'sh' not located at '/bin/sh', I think we should
implement a function similar to '__bionic_get_shell_path()' and use it for
'_PATH_BSHELL', 'system', etc. That begs the question of how the function
should find 'sh', and I don't have an answer for that. In principle, we could
design a configuration mechanism for 'confstr(_CS_PATH, buf, sizeof(buf))' and
use it to find the shell: that has some appeal, but making the mechanism
extensible enough to support "all of the standard utilities of POSIX.1-2017"
seems like a challenge.
What do you think?
-Philip
[1]: https://smlfamily.github.io/Basis/os-process.html#SIG:OS_PROCESS.system:VAL
[2]: https://lists.gnu.org/archive/html/help-guix/2021-11/msg00036.html
[3]: https://issues.guix.gnu.org/57050
[4]: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/
__bionic_get_shell_path.cpp
[5]: https://github.com/python/cpython/blob/
8184f0fce3b734413e3d3a282f1425d3cb8507fd/Lib/subprocess.py#L1760-L1762
[6]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html
[7]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html
[8]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html
[9]: https://issues.guix.gnu.org/57050#63
[10]: https://issues.guix.gnu.org/47180
[11]: https://man7.org/linux/man-pages/man3/system.3.html
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next reply other threads:[~2022-09-19 0:18 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-19 0:13 Philip McGrath [this message]
2022-09-19 7:07 ` What 'sh' should 'system' use? Liliana Marie Prikler
2022-09-26 8:07 ` Philip McGrath
2022-09-26 10:04 ` Liliana Marie Prikler
2022-09-19 12:55 ` Maxime Devos
2022-09-26 7:04 ` Philip McGrath
2022-09-26 9:41 ` Liliana Marie Prikler
2022-09-26 12:24 ` Maxime Devos
2022-10-01 16:54 ` Ludovic Courtès
2022-10-15 23:23 ` Philip McGrath
2022-10-16 7:04 ` Liliana Marie Prikler
2022-10-16 7:56 ` Philip McGrath
2022-10-16 8:23 ` Liliana Marie Prikler
2022-10-19 15:30 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2284386.8hzESeGDPO@bastet \
--to=philip@philipmcgrath.com \
--cc=guix-devel@gnu.org \
--cc=liliana.prikler@gmail.com \
--cc=liliana.prikler@ist.tugraz.at \
--cc=maximedevos@telenet.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).