unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* What 'sh' should 'system' use?
@ 2022-09-19  0:13 Philip McGrath
  2022-09-19  7:07 ` Liliana Marie Prikler
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Philip McGrath @ 2022-09-19  0:13 UTC (permalink / raw)
  To: guix; +Cc: Maxime Devos, Liliana Marie Prikler, Liliana Marie Prikler

[-- Attachment #1: Type: text/plain, Size: 10168 bytes --]

Hi Guix,

The C standard library includes a function 'system' to run a string as a shell 
command. Other languages provide similar functions, including Guile and many 
other Schemes and the Standard ML Basis Library.[1][2] Even without a 
dedicated library function, a program might want to run 'sh' using a general-
purpose mechanism for launching subprocesses.

How should the implementation of a function like 'system' find an 'sh' 
executable?

This question came up most recently with the patch series updating to Racket 
8.6 [3]: we already had patched in a workaround for Racket, but Chez Scheme 
and Zuo also needed workarounds to find 'sh'. I'm going to try to summarize the 
context I found in the course of that discussion and explain my current 
thinking, which changed over the course of that thread.

I think Guix should decide on an approach for functions like 'system' that can 
be applied consistently across languages. In particular, I don't think what 
our 'glibc' package is currently doing makes sense under any sort of approach.

First, an overview of three ways of trying to answer this question:

1) Many programs assume 'sh' can be found at '/bin/sh', but this is not true 
in Guix build environments, and it is not portable in general. Some systems 
have historically had a non-POSIX shell at '/bin/sh' and a POSIX shell at '/
usr/xpg4/bin/sh'.

More significantly, on Android, the shell is never at '/bin/sh'! (It is 
typically at '/system/bin/sh', except it is '/vendor/bin/sh' for "vendor 
code", whatever that is.[4]) That may be relevant for upstreams' willingness 
to reconsider this assumption.

In recent Python, the implementation of 'subprocess.Popen' uses [5]:

                unix_shell = ('/system/bin/sh' if
                          hasattr(sys, 'getandroidapilevel') else '/bin/sh')

which accounts for Android while being completely non-general.

On the other hand, even Nix puts '/bin/sh' at its usual path: we are really 
quite an outlier in this respect. (IIUC, Nix also has '/usr/bin/env', but no 
other utilities at FHS paths.)

In Glibc, 'sysdeps/posix/system.c' assumes '/bin/sh', and Guix currently 
patches it to refer to a store path as a string constant (but see below for 
issues).

2) There is a non-standard but ubiquitous macro '_PATH_BSHELL' in 'paths.h' 
which is supposed to give the path of a Bourne-like shell. In Guix, we patch 
this to refer to a store path as a string constant (but again, see below for 
issues). Notablty, on Android, it is not a compile-time constant: it is

    #define _PATH_BSHELL __bionic_get_shell_path()

where the function returns '/system/bin/sh' or '/vendor/bin/sh' as appropriate 
(but, in any case, it returns a `const char*` to a compile-time constant, so 
no manual memory management is needed).

3) POSIX actually has an answer to the question of how to find 'sh', but, 
unfortunately, its answer doesn't work in Guix build environments.

The POSIX spec for 'system' [6] says, in the informative section "Application 
Usage":

> There is no defined way for an application to find the specific path for the
> shell. However, confstr() can provide a value for PATH that is guaranteed
> to find the sh utility.

Likewise, the spec for 'sh' [7] says in the corresponding section:

> Applications should note that the standard PATH to the shell cannot be
> assumed to be either /bin/sh or /usr/bin/sh, and should be determined by
> interrogation of the PATH returned by getconf PATH, ensuring that the
> returned pathname is an absolute pathname and not a shell built-in.

Most emphatically, the spec for 'confstr' [8] says in the normative section 
"Description":

> If the implementation supports the POSIX shell option, the string stored in
> buf after a call to:
> 
>     confstr(_CS_PATH, buf, sizeof(buf))
> 
> can be used as a value of the PATH environment variable that accesses all of
> the standard utilities of POSIX.1-2017, that are provided in a manner
> accessible via the exec family of functions, if the return value is less
> than or equal to sizeof(buf).

It's worth noting here that 'PATH' is explicitly not consulted. Likewise, from 
the rationale section of [6]:

> One reviewer suggested that an implementation of system() might want to use
> an environment variable such as SHELL to determine which command
> interpreter to use. The supposed implementation would use the default
> command interpreter if the one specified by the environment variable was
> not available. This would allow a user, when using an application that
> prompts for command lines to be processed using system(), to specify a
> different command interpreter. Such an implementation is discouraged. If
> the alternate command interpreter did not follow the command line syntax
> specified in the Shell and Utilities volume of POSIX.1-2017, then changing
> SHELL would render system() non-conforming. This would affect applications
> that expected the specified behavior from system(), and since the Shell and
> Utilities volume of POSIX.1-2017 does not mention that SHELL affects
> system(), the application would not know that it needed to unset SHELL.

It seems that 'confstr' is supposed to access "configuration-defined string 
values", i.e. with possible configuration applied runtime, in contrast to the 
compile-time 'CS_PATH' (without an underscore).

Unfortunately, AFAICT, Glibc's 'confstr' implementation for '_CS_PATH' doesn't 
have any mechanism for configuring the search path: it simply returns the 
compile-time constant, which is:

    #define	CS_PATH	"/bin:/usr/bin"

and neither of those directories exist in Guix build environments.

So, with that context in mind, what are the problems with Guix's Glibc, and 
with existing solutions more generally?

First, a test program I tried in [9] seemed to indicate that '_PATH_BSHELL' 
refered to 'bash-static', but 'system("echo $BASH")' referred to 'bash-
minimal'. It's possible that my test gave an incorrect answer: I just tried 
'guix size glibc' (I hadn't thought of that earlier), and it doesn't list a 
reference to 'bash-minimal'. But, if we are embedding references in libc to 
two different copies of Bash, that seems clearly bad.

More broadly, I now think it would be better in we embedded zero references to 
copies of Bash in libc.

I have changed my mind on this before, and I could be persuaded otherwise. 
When I wrote the Racket patch for '/bin/sh' that had been in place before the 
latest change, I initially was going to use a hard-coded Bash only when '/bin/
sh' did not exist, but the discussion persuaded me it would make more sense to 
always use the 'sh' from the package's inputs.[10] For Racket, a dependency on 
'sh' didn't seem too unreasonable.

However, giving every program using Glibc a hard dependency on Bash—and on a 
particular Bash executable—seems like a much bigger imposition.

I now think it would be better to find 'sh' dynamically at run time rather than 
embed a reference to a specific shell at compile time. When 'sh' is needed, it 
can be provided by a build system or as an explicit input. When 'sh' isn't 
needed and perhaps isn't wanted, we should be able to create environments and 
programs without it, without libc pulling it along.

I found this note from the Linux man-pages project [11] interesting in that 
regard:

>        In versions of glibc before 2.1.3, the check for the availability
>        of /bin/sh was not actually performed if command was NULL;
>        instead it was always assumed to be available, and system()
>        always returned 1 in this case.  Since glibc 2.1.3, this check is
>        performed because, even though POSIX.1-2001 requires a conforming
>        implementation to provide a shell, that shell may not be
>        available or executable if the calling program has previously
>        called chroot(2) (which is not specified by POSIX.1-2001).

Finally, some possible courses of action:

1) If we want to continue to hard-code a specific shell into Glibc, I think we 
should document the decision (for example, why 'bash-static' vs. 'bash-
minimal'?) and recommendations for how packages should use it: '_PATH_BSHELL' 
is the best mechanism I've heard of so far, though I wish it were 
standardized, and the fact that it can't be portably assumed to be a string 
constant could be surprising.

2) If we want to make 'sh' a weak/dynamic reference, I think we should 
strongly consider arranging to make it available at '/bin/sh' when present. I 
expect this option would require less patching of other packages *by far* than 
any other approach.

3) If we want a dynamic 'sh' not located at '/bin/sh', I think we should 
implement a function similar to '__bionic_get_shell_path()' and use it for 
'_PATH_BSHELL', 'system', etc. That begs the question of how the function 
should find 'sh', and I don't have an answer for that. In principle, we could 
design a configuration mechanism for 'confstr(_CS_PATH, buf, sizeof(buf))' and 
use it to find the shell: that has some appeal, but making the mechanism 
extensible enough to support "all of the standard utilities of POSIX.1-2017" 
seems like a challenge.

What do you think?

-Philip

[1]: https://smlfamily.github.io/Basis/os-process.html#SIG:OS_PROCESS.system:VAL
[2]: https://lists.gnu.org/archive/html/help-guix/2021-11/msg00036.html
[3]: https://issues.guix.gnu.org/57050
[4]: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/
__bionic_get_shell_path.cpp
[5]: https://github.com/python/cpython/blob/
8184f0fce3b734413e3d3a282f1425d3cb8507fd/Lib/subprocess.py#L1760-L1762
[6]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html
[7]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html
[8]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html
[9]: https://issues.guix.gnu.org/57050#63
[10]: https://issues.guix.gnu.org/47180
[11]: https://man7.org/linux/man-pages/man3/system.3.html

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-19  0:13 What 'sh' should 'system' use? Philip McGrath
@ 2022-09-19  7:07 ` Liliana Marie Prikler
  2022-09-26  8:07   ` Philip McGrath
  2022-09-19 12:55 ` Maxime Devos
  2022-10-01 16:54 ` Ludovic Courtès
  2 siblings, 1 reply; 14+ messages in thread
From: Liliana Marie Prikler @ 2022-09-19  7:07 UTC (permalink / raw)
  To: Philip McGrath, guix; +Cc: Maxime Devos, Liliana Marie Prikler

Am Sonntag, dem 18.09.2022 um 20:13 -0400 schrieb Philip McGrath:
> On the other hand, even Nix puts '/bin/sh' at its usual path: we are
> really quite an outlier in this respect. (IIUC, Nix also has
> '/usr/bin/env', but no other utilities at FHS paths.)
We are not.  We provide both /bin/sh and /usr/bin/env.  If you're
talking about the build container then that's a much smaller
distinction.

> First, a test program I tried in [9] seemed to indicate that
> '_PATH_BSHELL' refered to 'bash-static', but 'system("echo $BASH")'
> referred to 'bash-minimal'. It's possible that my test gave an
> incorrect answer: I just tried 'guix size glibc' (I hadn't thought of
> that earlier), and it doesn't list a reference to 'bash-minimal'.
> But, if we are embedding references in libc to two different copies
> of Bash, that seems clearly bad.
We aren't embedding two references though; if we did, you'd see bash-
minimal in the closure.

> More broadly, I now think it would be better in we embedded zero
> references to copies of Bash in libc.
I don't think we can do that without breaking system.

> I have changed my mind on this before, and I could be persuaded
> otherwise.   When I wrote the Racket patch for '/bin/sh' that had
> been in place before the latest change, I initially was going to use
> a hard-coded Bash only when '/bin/sh' did not exist, but the
> discussion persuaded me it would make more sense to always use the
> 'sh' from the package's inputs.[10] For Racket, a dependency on 
> 'sh' didn't seem too unreasonable.
It certainly isn't the largest package racket pulls in, no.

> However, giving every program using Glibc a hard dependency on
> Bash—and on a particular Bash executable—seems like a much bigger
> imposition.
We're talking 1.7 MiB here.  Certainly a "big" imposition, but nothing
in comparison to the things you need in the store for bootstrapping
purposes.  Also note that bash-minimal, while only taking up 1.0 MiB
for itself, requires both glibc and gcc:lib, which apart from creating
a cycle does blow up its closure size quite a bit.

> I now think it would be better to find 'sh' dynamically at run time
Stop it.  Get some help.
> >        In versions of glibc before 2.1.3, [...] system()
> >        always returned 1 [...].
Note that always returning non-zero is required by POSIX 2017.

> >        [E]ven though POSIX.1-2001 requires a conforming
> >        implementation to provide a shell, that shell may not be
> >        available or executable if the calling program has
> >        previously called chroot(2) [...].
Which does nothing to aid us in actually shelling out.

> Finally, some possible courses of action:
> 
> 1) If we want to continue to hard-code a specific shell into Glibc, I
> think we should document the decision (for example, why 'bash-static'
> vs. 'bash-minimal'?) and recommendations for how packages should use
> it: '_PATH_BSHELL' 
> is the best mechanism I've heard of so far, though I wish it were 
> standardized, and the fact that it can't be portably assumed to be a
> string constant could be surprising.
Note, that _PATH_BSHELL is only required for programs that want to be
portable to other *nix systems.  For most programs written with only
the common Linux distros in mind, substituting "/bin/sh" is more than
enough in terms of compatibility.

> 2) If we want to make 'sh' a weak/dynamic reference, I think we
> should strongly consider arranging to make it available at '/bin/sh'
> when present. I expect this option would require less patching of
> other packages *by far* than any other approach.
How about no?

> 3) If we want a dynamic 'sh' not located at '/bin/sh', I think we
> should implement a function similar to '__bionic_get_shell_path()'
> and use it for '_PATH_BSHELL', 'system', etc. That begs the question
> of how the function should find 'sh', and I don't have an answer for
> that. In principle, we could design a configuration mechanism for
> 'confstr(_CS_PATH, buf, sizeof(buf))' and use it to find the shell:
> that has some appeal, but making the mechanism extensible enough to
> support "all of the standard utilities of POSIX.1-2017" seems like a
> challenge.
This sounds like a very long and convoluted way to hard-code a string.
Remember that _PATH_BSHELL ought to be resolved to a compile-time
pseudo-constant string.

> What do you think?
If you're really annoyed by the confstr thing, make it so that it hard-
codes the #$bash-static/bin.

Cheers


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-19  0:13 What 'sh' should 'system' use? Philip McGrath
  2022-09-19  7:07 ` Liliana Marie Prikler
@ 2022-09-19 12:55 ` Maxime Devos
  2022-09-26  7:04   ` Philip McGrath
  2022-10-01 16:54 ` Ludovic Courtès
  2 siblings, 1 reply; 14+ messages in thread
From: Maxime Devos @ 2022-09-19 12:55 UTC (permalink / raw)
  To: Philip McGrath, guix; +Cc: Liliana Marie Prikler, Liliana Marie Prikler


[-- Attachment #1.1.1: Type: text/plain, Size: 3219 bytes --]



On 19-09-2022 02:13, Philip McGrath wrote:
> 1) If we want to continue to hard-code a specific shell into Glibc,

We do, for reproducibility -- otherwise, the behaviour of the 'system' 
function depends on whatever is the current /bin/sh, and sometimes 
/bin/sh is updated (and on some foreign systems it might not even be the 
bash we are used to).

  I think we
> should document the decision (for example, why 'bash-static' vs. 'bash-
> minimal'?)

Because cycles -- bash-minimal is linked to a (shared) glibc, which is a 
separate package from bash-minimal, so glibc cannot use bash-minimal, it 
uses bash-static instead which is linked to a (static) glibc (which 
might use a bootstrap bash (not 100% sure), but it's statically linked, 
so no reference to the bootstrap bash remains IIUC).

Also, why?  This is an implementation detail.  Who would the target 
audience be for this documentation?

> and recommendations for how packages should use it: '_PATH_BSHELL'
> is the best mechanism I've heard of so far, though I wish it were
> standardized, and the fact that it can't be portably assumed to be a string
> constant could be surprising.

I consider _not_ using it, and using (4) instead, to be best.
If not suitable (for example, because a shell is needed to run an actual 
shell script), then a plain "sh" looked up in the $PATH (like other 
binaries) and substitute*-ed by Guix should suffice.

> 
> 2) If we want to make 'sh' a weak/dynamic reference, I think we should
> strongly consider arranging to make it available at '/bin/sh' when present. I
> expect this option would require less patching of other packages*by far*  than
> any other approach.

See (1) (reproducibility) -- also, you would need to modify the daemon 
for that, so there are compatibility concerns, and then we're stuck with 
the /bin/sh special case forever (unless breaking compatibility would 
later be considered acceptable).

> 
> 3) If we want a dynamic 'sh' not located at '/bin/sh', I think we should
> implement a function similar to '__bionic_get_shell_path()' and use it for
> '_PATH_BSHELL', 'system', etc. That begs the question of how the function
> should find 'sh', and I don't have an answer for that.

How about $PATH?

> In principle, we could
> design a configuration mechanism for 'confstr(_CS_PATH, buf, sizeof(buf))' and
> use it to find the shell: that has some appeal, but making the mechanism
> extensible enough to support "all of the standard utilities of POSIX.1-2017"
> seems like a challenge.
> What do you think?

(4) Stop using 'system' in applications -- instead use whatever the 
language's equivalent of Guile's system*, execl ... or Guix' 'invoke'. 
Why?  Because 'system'-like functions requires quoting the command line 
arguments whereas in 'system*'-like functions you could just pass a list 
of command line arguments, and it's easy to get the quoting wrong, 
especially if some of the arguments are generated dynamically.

As a bonus, this could often remove a dependency on bash{-minimal,-static,}.

Maybe we can eventually remove _PATH_BSHELL and 'system' from our glibc 
(and Guile, ...).

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-19 12:55 ` Maxime Devos
@ 2022-09-26  7:04   ` Philip McGrath
  2022-09-26  9:41     ` Liliana Marie Prikler
  2022-09-26 12:24     ` Maxime Devos
  0 siblings, 2 replies; 14+ messages in thread
From: Philip McGrath @ 2022-09-26  7:04 UTC (permalink / raw)
  To: Maxime Devos, guix; +Cc: Liliana Marie Prikler, Liliana Marie Prikler

Hi,


On 9/19/22 08:55, Maxime Devos wrote:
> (4) Stop using 'system' in applications -- instead use whatever the 
> language's equivalent of Guile's system*, execl ... or Guix' 
> 'invoke'. Why?  Because 'system'-like functions requires quoting the
>  command line arguments whereas in 'system*'-like functions you could
>  just pass a list of command line arguments, and it's easy to get the
>  quoting wrong, especially if some of the arguments are generated 
> dynamically.
> 
> As a bonus, this could often remove a dependency on 
> bash{-minimal,-static,}.
> 

I definitely advocate 'system*'-like functions in general. Still,
'system'-like functions exist: I'm advocating that Guix should should
have a consistent answer for how such functions should behave.

(Very occasionally, a program really does want to invoke the shell, such
as when shell expansion is part of an existing API.)

 From a different perspective, this is part of why I've recently been
thinking we should find 'sh' dynamically: most programs/environments
don't, and shouldn't, need bash{-minimal,-static}, so it seems wrong to
make it a mandatory dependency of libc.

On 9/19/22 08:55, Maxime Devos wrote:
> 
> On 19-09-2022 02:13, Philip McGrath wrote:
>> 1) If we want to continue to hard-code a specific shell into 
>> Glibc,
> 
> We do, for reproducibility -- otherwise, the behaviour of the 
> 'system' function depends on whatever is the current /bin/sh, and 
> sometimes /bin/sh is updated (and on some foreign systems it might 
> not even be the bash we are used to).
> 
> [...]
> 
>> 
>> 2) If we want to make 'sh' a weak/dynamic reference, I think we 
>> should strongly consider arranging to make it available at 
>> '/bin/sh' when present. I expect this option would require less 
>> patching of other packages*by far*  than any other approach.
> 
> See (1) (reproducibility) -- also, you would need to modify the 
> daemon for that, so there are compatibility concerns, and then we're
>  stuck with the /bin/sh special case forever (unless breaking 
> compatibility would later be considered acceptable).
> 

I don't think there's a reproducibility problem. Guix already can create
reproducible containers with "/bin/sh" (e.g. 'guix shell coreutils
--container -- ls /bin') and without "/bin/sh" (as in package build
environments).

I haven't investigated whether adding the ability to create "/bin/sh" in
build containers would require modifying the daemon or just sending the
daemon different instructions. However, AIUI, Nix *always* creates
"/bin/sh" in build containers, which makes me further expect that any
change needed to the daemon would be small.

To be clear, I'm not proposing that we always create "/bin/sh" in build
containers. At a low level, I'm suggesting that we add the ability to
create "/bin/sh" when desired. I can imagine one possibility for a
high-level interface would be to create "/bin/sh" by default when an
input provides "bin/sh", and it might turn out that we end up wanting
"/bin/sh" in most or all build containers in practice, but I see those
as secondary questions.

>> and recommendations for how packages should use it: '_PATH_BSHELL' 
>> is the best mechanism I've heard of so far, though I wish it were 
>> standardized, and the fact that it can't be portably assumed to be
>> a string constant could be surprising.
> 
> I consider _not_ using it, and using (4) instead, to be best. If not
>  suitable (for example, because a shell is needed to run an actual 
> shell script), then a plain "sh" looked up in the $PATH (like other 
> binaries) and substitute*-ed by Guix should suffice.
> 

As I said above, I agree that 'system*' should be preferred over
'system' when possible.

There are a few dimensions here that I want to try to pick apart.

When you say:

> a plain "sh" looked up in the $PATH (like other binaries) and 
> substitute*-ed by Guix should suffice

there are a few different things that might mean.

I think you're probably referring to the status quo, where "sh" is
looked up in the 'inputs' or a G-expression equivalent and an absolute
reference to that particular "sh" is embedded into the package being
built. (But, when cross-compiling, that "sh" would not be in the $PATH
in the build environment.)

There's a different question about $PATH vs. _CS_PATH that I'll address
later.

I see at least two reasons to prefer finding "sh" dynamically at run-time.

First, we have other POSIX-like shells packaged in Guix, such as dash,
zsh, and gash. Currently, to create an environment where one of these
shells is used to run 'system'-like functions (e.g. because dash is
supposed to be faster than bash), you would have to recompile everything
that depends on glibc. (Maybe you could craft a very ugly graft.)

Second, sometimes people may want to create environments, images, etc.
without an "sh" available. In some sense this is a special case of using
an alternate shell, but the consequences of the status quo are
especially notable. Currently, practically any environment or image Guix
creates will include bash-static, because glibc refers to it.

For an especially ironic example, consider this note from `info
"(guix)Invoking guix pack"`:


> Note: Singularity _requires_ you to provide ‘/bin/sh’ in the image.
> For that reason, ‘guix pack -f squashfs’ always implies ‘-S
> /bin=bin’.  Thus, your ‘guix pack’ invocation must always start with
> something like:
> 
>     guix pack -f squashfs bash ...
> 
> If you forget the ‘bash’ (or similar) package, ‘singularity run’ and
> ‘singularity exec’ will fail with an unhelpful “no such file or
> directory” message.

Running `guix pack -f squashfs hello` warns you about the lack of a
shell, and indeed the resulting image doesn't contain "/bin/sh" ... but
it does contain
"/gnu/store/720rj90bch716isd8z7lcwrnvz28ap4y-bash-static-5.1.8/bin/sh"!

Furthermore, if you run `guix pack -f squashfs hello bash-static`, the
resulting image contains both
"/gnu/store/720rj90bch716isd8z7lcwrnvz28ap4y-bash-static-5.1.8/bin/sh"
and "/gnu/store/4f304c7dp68hkcp1zi1i07zm8nfvvyp7-bash-static-5.1.8/bin/sh".

> 
>> 
>> 3) If we want a dynamic 'sh' not located at '/bin/sh', I think we 
>> should implement a function similar to '__bionic_get_shell_path()'
>>  and use it for '_PATH_BSHELL', 'system', etc. That begs the 
>> question of how the function should find 'sh', and I don't have an
>>  answer for that.
> 
> How about $PATH?
> 

This is a subtle point, and it depends in some ways on what you are
trying to use the 'sh' for. From the "Rationale" in the POSIX spec for
'confstr' 
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html>:

> The original need for this function was to provide a way of finding
> the configuration-defined default value for the environment variable
> PATH. Since PATH can be modified by the user to include directories
> that could contain utilities replacing the standard utilities in the
> Shell and Utilities volume of POSIX.1-2017, applications need a way
> to determine the system-supplied PATH environment variable value that
> contains the correct search path for the standard utilities.

I don't have a strong view about the merits of using PATH or not in 
general, and, again 'confstr' with '_CS_PATH' doesn't currently give a 
useful result on Guix.

For 'sh' specifically, though, there's some set of programs that look at 
$SHELL or /etc/passwd or other mechanisms for a highly-configurable 
choice of shell: those aren't relevant here. This question concerns a 
different set of programs that are looking for a reliable plain-vanilla 
'sh': this may be configured at the level of the environment (OS, 
container, chroot, etc.)---including configuring it not to exist---but 
it's a less fine-grained sort of configuration, and there's a stronger 
expectation that it will be a POSIX-like 'sh' (not fish or 
/usr/sbin/nologin).

It seems POSIX would like 'sh' to be found using '_CS_PATH', but I don't 
know of any programs that actually do that, and it doesn't work on Guix.

Programs in practice seem to look at "/bin/sh", and environments 
configuring it by choosing what (possibly nothing) to put at "/bin/sh" 
from the perspective of programs in that environment.

>> I think we should document the decision (for example, why 
>> 'bash-static' vs. 'bash- minimal'?)
> 
> Because cycles -- bash-minimal is linked to a (shared) glibc, which 
> is a separate package from bash-minimal, so glibc cannot use 
> bash-minimal, it uses bash-static instead which is linked to a 
> (static) glibc (which might use a bootstrap bash (not 100% sure), but
> it's statically linked, so no reference to the bootstrap bash remains
> IIUC).
> 
> Also, why?  This is an implementation detail.  Who would the target 
> audience be for this documentation?
> 

I don't mean "document the decision" to necessarily imply something 
elaborate or formal, but I think the next person packaging a language 
with a function like 'system' in its standard library shouldn't have to 
reevaluate these questions from scratch. Also, if we decided the right 
thing were to advocate for upstreams to do something differently for the 
sake of portability (e.g. trying to get people to use _CS_PATH---which 
I'm not suggesting), it would help to have a rationale to point to.

Specifically with respect to bash-minimal vs. bash-static, I'm still not 
clear on when I should use which. The package descriptions are 
identical, and I haven't found a clear (to me, at least) explanation in 
the source code comments. For example, if bash-static is needed to avoid 
a cycle as you say, what is the benefit of also having bash-minimal?

-Philip


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-19  7:07 ` Liliana Marie Prikler
@ 2022-09-26  8:07   ` Philip McGrath
  2022-09-26 10:04     ` Liliana Marie Prikler
  0 siblings, 1 reply; 14+ messages in thread
From: Philip McGrath @ 2022-09-26  8:07 UTC (permalink / raw)
  To: Liliana Marie Prikler, guix; +Cc: Maxime Devos, Liliana Marie Prikler

Hi,

On 9/19/22 03:07, Liliana Marie Prikler wrote:
> Am Sonntag, dem 18.09.2022 um 20:13 -0400 schrieb Philip McGrath:
>> On the other hand, even Nix puts '/bin/sh' at its usual path: we
>> are really quite an outlier in this respect. (IIUC, Nix also has 
>> '/usr/bin/env', but no other utilities at FHS paths.)
> We are not.  We provide both /bin/sh and /usr/bin/env.  If you're 
> talking about the build container then that's a much smaller 
> distinction.
> 

Yes, I'm talking about the build container. But for the build container,
programs/libraries that use "/bin/sh" would work unmodified.

>> More broadly, I now think it would be better in we embedded zero 
>> references to copies of Bash in libc.
> I don't think we can do that without breaking system.
> 

When "/bin/sh" is not available at runtime, I think libc's `system`
ought to return 127, and other `system`-like functions should raise
exceptions or whatever the idiomatic way is to signal failure. Of
course, we will presumably need to make "/bin/sh" available in many more
places, but don't think it's surprising for programs that need to run
shell commands to fail in the absence of a shell.

>> However, giving every program using Glibc a hard dependency on 
>> Bash—and on a particular Bash executable—seems like a much bigger 
>> imposition.
> We're talking 1.7 MiB here.  Certainly a "big" imposition, but
> nothing in comparison to the things you need in the store for
> bootstrapping purposes.  Also note that bash-minimal, while only
> taking up 1.0 MiB for itself, requires both glibc and gcc:lib, which
> apart from creating a cycle does blow up its closure size quite a
> bit.
> 

I'm less concerned with the literal size than with the significance of
putting a specific shell so near the root of most dependency graphs: I
tried to give examples in my reply to Maxime, like creating containers
without a shell.

>>> In versions of glibc before 2.1.3, [...] system() always returned
>>> 1 [...].
> Note that always returning non-zero is required by POSIX 2017.
> 

To quote the whole paragraph from
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html>:

> Note that, system(NULL) is required to return non-zero, indicating
> that there is a command language interpreter. At first glance, this
> would seem to conflict with the ISO C standard which allows
> system(NULL) to return zero. There is no conflict, however. A system
> must have a command language interpreter, and is non-conforming if
> none is present. It is therefore permissible for the system()
> function on such a system to implement the behavior specified by the
> ISO C standard as long as it is understood that the implementation
> does not conform to POSIX.1-2017 if system(NULL) returns zero.

I understand that to mean that `system(NULL)` returning zero indicates 
that the program is not (currently) running in a POSIX.1-2017 
environment. Guix creates many environments that do not conform to 
POSIX.1-2017: for example, any environment without `vi`.

-Philip


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-26  7:04   ` Philip McGrath
@ 2022-09-26  9:41     ` Liliana Marie Prikler
  2022-09-26 12:24     ` Maxime Devos
  1 sibling, 0 replies; 14+ messages in thread
From: Liliana Marie Prikler @ 2022-09-26  9:41 UTC (permalink / raw)
  To: Philip McGrath, Maxime Devos, guix; +Cc: Liliana Marie Prikler

Hi

Am Montag, dem 26.09.2022 um 03:04 -0400 schrieb Philip McGrath:
> I definitely advocate 'system*'-like functions in general. Still,
> 'system'-like functions exist: I'm advocating that Guix should should
> have a consistent answer for how such functions should behave.
How is the current answer inconsistent?

> From a different perspective, this is part of why I've recently been
> thinking we should find 'sh' dynamically: most programs/environments
> don't, and shouldn't, need bash{-minimal,-static}, so it seems wrong
> to make it a mandatory dependency of libc.
Wrong in which sense?  Technically, morally, philosophically?  I don't
think any of the values upheld by the GNU project, such as the four
freedoms or the FSDG, nor our code of contract are violated by having
bash-static in libc.

> I think you're probably referring to the status quo, where "sh" is
> looked up in the 'inputs' or a G-expression equivalent and an
> absolute reference to that particular "sh" is embedded into the
> package being built. (But, when cross-compiling, that "sh" would not
> be in the$PATH in the build environment.)
You still get an implicit bash-minimal in native-inputs.  It's just not
a regular input.

> First, we have other POSIX-like shells packaged in Guix, such as
> dash, zsh, and gash. Currently, to create an environment where one of
> these shells is used to run 'system'-like functions (e.g. because
> dash is supposed to be faster than bash), you would have to recompile
> everything that depends on glibc. (Maybe you could craft a very ugly
> graft.)
The performance benefits of dash are irrelevant when you compare it to
fork and exec.  Thus I highly question the point you're trying to make.

> Second, sometimes people may want to create environments, images,
> etc. without an "sh" available. In some sense this is a special case
> of using an alternate shell, but the consequences of the status quo
> are especially notable. Currently, practically any environment or
> image Guix creates will include bash-static, because glibc refers to
> it.
And yet, this bash-static will only be inside the container; with even
its exact file name unknown and outside of PATH (and even _CS_PATH
while we're at it).  If your concern is that an attacker might break
your containerized application and do arbitrary code execution in bash
afterwards, I think you got your priorities mixed up; said attacker
could probably side-load a static bash anyway.  And I hardly doubt that
any concern not related to security is critical either.

> Programs in practice seem to look at "/bin/sh", and environments 
> configuring it by choosing what (possibly nothing) to put at
> "/bin/sh" from the perspective of programs in that environment.
I mean, both are valid solutions.  You're not going to put an
unreliable shell as /bin/sh or attempt to shadow sh in your $PATH. 
confstr and _CS_PATH are for paranoid people who believe you might (and
even if you do use it, how sure are you that you're not just getting
/bin/sh).

> > > I think we should document the decision (for example, why 
> > > 'bash-static' vs. 'bash- minimal'?)
> > 
> > Because cycles -- bash-minimal is linked to a (shared) glibc, which
> > is a separate package from bash-minimal, so glibc cannot use 
> > bash-minimal, it uses bash-static instead which is linked to a 
> > (static) glibc (which might use a bootstrap bash (not 100% sure),
> > but it's statically linked, so no reference to the bootstrap bash
> > remains IIUC).
> > 
> > Also, why?  This is an implementation detail.  Who would the target
> > audience be for this documentation?
> > 
> 
> I don't mean "document the decision" to necessarily imply something 
> elaborate or formal, but I think the next person packaging a language
> with a function like 'system' in its standard library shouldn't have
> to reevaluate these questions from scratch. Also, if we decided the
> right thing were to advocate for upstreams to do something
> differently for the sake of portability (e.g. trying to get people to
> use _CS_PATH---which I'm not suggesting), it would help to have a
> rationale to point to.
> 
> Specifically with respect to bash-minimal vs. bash-static, I'm still
> not clear on when I should use which. 
You're not going to need bash-static.  For most intents and purposes,
you can ignore its existence.  In fact, if it bothers you that much, I
suggest hiding it like gcc.

> The package descriptions are identical, and I haven't found a clear
> (to me, at least) explanation in the source code comments. For
> example, if bash-static is needed to avoid a cycle as you say, what
> is the benefit of also having bash-minimal?
bash-minimal is to be used in shell wrappers, as they don't need a
full-blown bash (with among others the ability to load extensions,
which bash-minimal lacks).  Unlike bash-static, bash-minimal can be
grafted (both itself and its dependents), so fixing a safety-critical
bug in any of those does not cause a world rebuild.

Cheers


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-26  8:07   ` Philip McGrath
@ 2022-09-26 10:04     ` Liliana Marie Prikler
  0 siblings, 0 replies; 14+ messages in thread
From: Liliana Marie Prikler @ 2022-09-26 10:04 UTC (permalink / raw)
  To: Philip McGrath, guix; +Cc: Maxime Devos, Liliana Marie Prikler

Hi,

Am Montag, dem 26.09.2022 um 04:07 -0400 schrieb Philip McGrath:
> Hi,
> 
> On 9/19/22 03:07, Liliana Marie Prikler wrote:
> > Am Sonntag, dem 18.09.2022 um 20:13 -0400 schrieb Philip McGrath:
> > > On the other hand, even Nix puts '/bin/sh' at its usual path: we
> > > are really quite an outlier in this respect. (IIUC, Nix also has 
> > > '/usr/bin/env', but no other utilities at FHS paths.)
> > We are not.  We provide both /bin/sh and /usr/bin/env.  If you're 
> > talking about the build container then that's a much smaller 
> > distinction.
> > 
> 
> Yes, I'm talking about the build container. But for the build
> container, programs/libraries that use "/bin/sh" would work
> unmodified.
I think there's limited value in having them work unmodified; see
‘patch-source-shebangs’.

> > > More broadly, I now think it would be better in we embedded zero 
> > > references to copies of Bash in libc.
> > I don't think we can do that without breaking system.
> > 
> 
> When "/bin/sh" is not available at runtime, I think libc's `system`
> ought to return 127, and other `system`-like functions should raise
> exceptions or whatever the idiomatic way is to signal failure. Of
> course, we will presumably need to make "/bin/sh" available in many
> more places, but don't think it's surprising for programs that need
> to run shell commands to fail in the absence of a shell.
Au contraire, I'd argue that people who use system will be the most
surprised when it actually does fail.

> > > However, giving every program using Glibc a hard dependency on 
> > > Bash—and on a particular Bash executable—seems like a much bigger
> > > imposition.
> > We're talking 1.7 MiB here.  Certainly a "big" imposition, but
> > nothing in comparison to the things you need in the store for
> > bootstrapping purposes.  Also note that bash-minimal, while only
> > taking up 1.0 MiB for itself, requires both glibc and gcc:lib,
> > which apart from creating a cycle does blow up its closure size
> > quite a bit.
> > 
> 
> I'm less concerned with the literal size than with the significance
> of putting a specific shell so near the root of most dependency
> graphs: I tried to give examples in my reply to Maxime, like creating
> containers without a shell.
What is this significance?  From the examples you gave Maxime, I find
it insignificant.

> > > 
> > It is therefore permissible for the system() function on such a
> > system to implement the behavior specified by the ISO C standard as
> > long as it is understood that the implementation does not conform
> > to POSIX.1-2017 if system(NULL) returns zero.
> 
> I understand that to mean that `system(NULL)` returning zero
> indicates that the program is not (currently) running in a POSIX.1-
> 2017 environment.
This test is severely broken.  It fails to account for non-POSIX.1-2017
systems, that nevertheless return 1.

From the GNU coding standards [1]:
> The GNU Project regards standards published by other organizations as
> suggestions, not orders. We consider those standards, but we do not
> “obey” them. In developing a GNU program, you should implement an
> outside standard’s specifications when that makes the GNU system
> better overall in an objective sense. When it doesn’t, you shouldn’t.
Here, conforming to POSIX makes sense: it improves portability at
little cost.

> Guix creates many environments that do not conform to POSIX.1-2017:
> for example, any environment without `vi`.
Here it doesn't.  The convenience of vi is highly debatable.

Cheers


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-26  7:04   ` Philip McGrath
  2022-09-26  9:41     ` Liliana Marie Prikler
@ 2022-09-26 12:24     ` Maxime Devos
  1 sibling, 0 replies; 14+ messages in thread
From: Maxime Devos @ 2022-09-26 12:24 UTC (permalink / raw)
  To: Philip McGrath, guix; +Cc: Liliana Marie Prikler, Liliana Marie Prikler


[-- Attachment #1.1.1: Type: text/plain, Size: 9599 bytes --]



On 26-09-2022 09:04, Philip McGrath wrote:
> [...]
> (Very occasionally, a program really does want to invoke the shell, such
> as when shell expansion is part of an existing API.)
> 
> From a different perspective, this is part of why I've recently been
> thinking we should find 'sh' dynamically: most programs/environments
> don't, and shouldn't, need bash{-minimal,-static}, so it seems wrong to
> make it a mandatory dependency of libc.

In another thread, I proposed replacing 'system' by a macro 'system' 
that looks for for a, say, GUIX_BIN_SH preprocessor definition and then 
calls _guix_system(GUIX_BIN_SH,...) or such, and remove the 'system' 
function.

That way, glibc does not use bash-whatever anymore, but we still avoid 
doing things dynamically, avoiding the problems that dynamic finding 
entails.

For packages that use 'system', we would need to then resolve the build 
resulting build failure by passing -DGUIX_BIN_SH (maybe we could have a 
libc-system-function package that overrides the header containing 
'system' and automatically sets GUIX_BIN_SH)?

>> See (1) (reproducibility) -- also, you would need to modify the daemon for that, so there are compatibility concerns, and then we're
>>  stuck with the /bin/sh special case forever (unless breaking compatibility would later be considered acceptable).
>>
> 
> I don't think there's a reproducibility problem.

You are proposing 'weak references' -- weak references are automatically 
broken if the thing referred to is GC'ed (the weak reference is weak, so 
it doesn't count as a reference that keeps it from being GC'ed).

That means that the build process depends on whether bash-whatever is in 
the store or not.

Even if not, the compatibility concerns remain, and incompatible daemons 
sound like a form of irreproducibility to me.

> Guix already can create
> reproducible containers with "/bin/sh" (e.g. 'guix shell coreutils
> --container -- ls /bin') and without "/bin/sh" (as in package build
> environments).
> 
> I haven't investigated whether adding the ability to create "/bin/sh" in
> build containers would require modifying the daemon or just sending the
> daemon different instructions. However, AIUI, Nix *always* creates
> "/bin/sh" in build containers, which makes me further expect that any
> change needed to the daemon would be small.
 >
> To be clear, I'm not proposing that we always create "/bin/sh" in build
> containers. At a low level, I'm suggesting that we add the ability to
> create "/bin/sh" when desired. I can imagine one possibility for a
> high-level interface would be to create "/bin/sh" by default when an
> input provides "bin/sh", and it might turn out that we end up wanting
> "/bin/sh" in most or all build containers in practice, but I see those
> as secondary questions.

Again, I don't see how special-casing /bin/sh even further is desirable.

> There are a few dimensions here that I want to try to pick apart.
> 
> When you say:
> 
>> a plain "sh" looked up in the $PATH (like other binaries) and substitute*-ed by Guix should suffice >
> there are a few different things that might mean. > I think you're probably referring to the status quo, where "sh" is
> looked up in the 'inputs' or a G-expression equivalent and an absolute
> reference to that particular "sh" is embedded into the package being
> built. (But, when cross-compiling, that "sh" would not be in the $PATH
> in the build environment.)

Yes -- to be clear, the looking up in $PATH is for upstream, in Guix it 
would be patched with the absolute reference to something in 'inputs' 
instead.

> There's a different question about $PATH vs. _CS_PATH that I'll address
> later.
> 
> I see at least two reasons to prefer finding "sh" dynamically at run-time.
> 
> First, we have other POSIX-like shells packaged in Guix, such as dash,
> zsh, and gash. Currently, to create an environment where one of these
> shells is used to run 'system'-like functions (e.g. because dash is
> supposed to be faster than bash), you would have to recompile everything
> that depends on glibc. (Maybe you could craft a very ugly graft.)

dash, zsh and gash are incompatible, so you can't simply replace things 
-- looking it up dynamically would potentially introduce bugs. 
Additionally, 'sh' might not exist in /bin/sh or $PATH, so possibly it 
couldn't be found dynamically, and possibly the version it finds is 
incompatible (reproducibility).

If dash is faster than bash and sufficiently compatible, you can propose to

> Second, sometimes people may want to create environments, images, etc.
> without an "sh" available.

You can do this without dynamic finding and its downsides, see e.g. the 
preprocessor thing mentioned in the beginning.

> In some sense this is a special case of using
> an alternate shell, but the consequences of the status quo are
> especially notable. Currently, practically any environment or image Guix
> creates will include bash-static, because glibc refers to it.
> 
> For an especially ironic example, consider this note from `info
> "(guix)Invoking guix pack"`:
> 
> [...]

That example is about installing something in /bin/sh, it's unrelated to 
'system' AFAICT.

>>>
>>> 3) If we want a dynamic 'sh' not located at '/bin/sh', I think we should implement a function similar to '__bionic_get_shell_path()'
>>>  and use it for '_PATH_BSHELL', 'system', etc. That begs the question of how the function should find 'sh', and I don't have an
>>>  answer for that.
>>
>> How about $PATH?
>>
> 
> This is a subtle point, and it depends in some ways on what you are
> trying to use the 'sh' for. From the "Rationale" in the POSIX spec for
> 'confstr' <https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html>:
> 
>> The original need for this function was to provide a way of finding
>> the configuration-defined default value for the environment variable
>> PATH. Since PATH can be modified by the user to include directories
>> that could contain utilities replacing the standard utilities in the
>> Shell and Utilities volume of POSIX.1-2017, applications need a way
>> to determine the system-supplied PATH environment variable value that
>> contains the correct search path for the standard utilities.

Guix likes users being able to replace things, so $PATH seems more 
desirable here than _CS_PATH (the latter being more difficult to modify 
or install things in) -- the 'system-supplied PATH' should be whatever 
the user wants it to be.

> I don't have a strong view about the merits of using PATH or not in general, and, again 'confstr' with '_CS_PATH' doesn't currently give a useful result on Guix.
> 
> For 'sh' specifically, though, there's some set of programs that look at $SHELL or /etc/passwd or other mechanisms for a highly-configurable choice of shell: those aren't relevant here. This question concerns a different set of programs that are looking for a reliable plain-vanilla 'sh': this may be configured at the level of the environment (OS, container, chroot, etc.)---including configuring it not to exist---but it's a less fine-grained sort of configuration, and there's a stronger expectation that it will be a POSIX-like 'sh' (not fish or /usr/sbin/nologin).

I don't see an argument against $PATH or for _PATH_BSHELL or_CS_PATH here.

> It seems POSIX would like 'sh' to be found using '_CS_PATH',

How is this relevant?

> but I don't know of any programs that actually do that, and it doesn't work on Guix.

If it doesn't work, we can make it work -- in the initial e-mail, you 
are proposing chances, so I don't think 'it (currently) doesn't work' 
counts as an argument.

> 
> Programs in practice seem to look at "/bin/sh", and environments configuring it by choosing what (possibly nothing) to put at "/bin/sh" from the perspective of programs in that environment. 

In the initial e-mail, you were among other things asking what mechanism 
programs and libraries should use.  Now, you are mentioning what 
programs are currently using (*), and presenting it as an argument. 
This is rather cyclic.

(*) This is not true for appropriately patched programs and libraries in 
Guix, e.g. glibc and racket.

> I don't mean "document the decision" to necessarily imply something elaborate or formal, but I think the next person packaging a language with a function like 'system' in its standard library shouldn't have to reevaluate these questions from scratch. Also, if we decided the right thing were to advocate for upstreams to do something differently for the sake of portability (e.g. trying to get people to use _CS_PATH---which I'm not suggesting), it would help to have a rationale to point to.

OK, though I think the answer is: don't do that, 'system' is prone to 
errors, implement interfaces like 'system*' it.

> Specifically with respect to bash-minimal vs. bash-static, I'm still not clear on when I should use which. The package descriptions are identical, and I haven't found a clear (to me, at least) explanation in the source code comments. For example, if bash-static is needed to avoid a cycle as you say, what is the benefit of also having bash-minimal? 

'bash-minimal'.  'bash-static' is a hack to resolve the cycle, hence 
something to be avoided where possible and preferably eventually 
eliminated (e.g. with preprocessor tricks or by dynamic finding).

Greetings,
Maxime.



[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-09-19  0:13 What 'sh' should 'system' use? Philip McGrath
  2022-09-19  7:07 ` Liliana Marie Prikler
  2022-09-19 12:55 ` Maxime Devos
@ 2022-10-01 16:54 ` Ludovic Courtès
  2022-10-15 23:23   ` Philip McGrath
  2 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2022-10-01 16:54 UTC (permalink / raw)
  To: Philip McGrath
  Cc: guix, Maxime Devos, Liliana Marie Prikler, Liliana Marie Prikler

Hello!

Philip McGrath <philip@philipmcgrath.com> skribis:

> 1) If we want to continue to hard-code a specific shell into Glibc, I think we 
> should document the decision (for example, why 'bash-static' vs. 'bash-
> minimal'?) […]

The choice of ‘bash-static’ rather than ‘bash-minimal’ is motivated by
the fact that, in (gnu packages commencement), we want to make sure
‘glibc-final’ does not retain references to its build-time environment.
See #:allowed-references in ‘glibc-final’.

> 2) If we want to make 'sh' a weak/dynamic reference, I think we should 
> strongly consider arranging to make it available at '/bin/sh' when present. I 
> expect this option would require less patching of other packages *by far* than 
> any other approach.

This is not a viable option because build containers lack /bin/sh.

Overall, I think the current situation is a reasonable tradeoff.  It
forces us to do some patching, indeed, but I think that’s acceptable:
we’re talking about a handful of packages.

WDYT?

Ludo’.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-10-01 16:54 ` Ludovic Courtès
@ 2022-10-15 23:23   ` Philip McGrath
  2022-10-16  7:04     ` Liliana Marie Prikler
  2022-10-19 15:30     ` Ludovic Courtès
  0 siblings, 2 replies; 14+ messages in thread
From: Philip McGrath @ 2022-10-15 23:23 UTC (permalink / raw)
  To: Ludovic Courtès
  Cc: guix, Maxime Devos, Liliana Marie Prikler, Liliana Marie Prikler

[-- Attachment #1: Type: text/plain, Size: 3443 bytes --]

On Saturday, October 1, 2022 12:54:27 PM EDT Ludovic Courtès wrote:
> Hello!
> 
> Philip McGrath <philip@philipmcgrath.com> skribis:
> > 1) If we want to continue to hard-code a specific shell into Glibc, I
> > think we should document the decision (for example, why 'bash-static' vs.
> > 'bash- minimal'?) […]
> 
> The choice of ‘bash-static’ rather than ‘bash-minimal’ is motivated by
> the fact that, in (gnu packages commencement), we want to make sure
> ‘glibc-final’ does not retain references to its build-time environment.
> See #:allowed-references in ‘glibc-final’.
> 

This makes sense as far as using 'bash-static' in Glibc. The aspects I'm unsure
of are:

 1. If I'm packaging software that implements a function like 'system'
    (e.g. Racket, SML/NJ, Chez Scheme, etc.), should I use 'bash-minimal' or
    'bash-static'?

 2. Do we really need 'bash-minimal' at all? Why not just replace it with
    'bash-static'?

In particular, AFAICT, 'bash-minimal' currently has a reference to
'bash-static' via Glibc:

--8<---------------cut here---------------start------------->8---
$ guix size bash-minimal 
store item                                                       total    self
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33              38.3    36.6  50.4%
/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib          71.7    33.4  45.9%
/gnu/store/720rj90bch716isd8z7lcwrnvz28ap4y-bash-static-5.1.8        1.7     1.7   2.3%
/gnu/store/chfwin3a4qp1znnpsjbmydr2jbzk0d6y-bash-minimal-5.1.8      72.7     1.0   1.4%
total: 72.7 MiB
--8<---------------cut here---------------end--------------->8---

> > 2) If we want to make 'sh' a weak/dynamic reference, I think we should
> > strongly consider arranging to make it available at '/bin/sh' when
> > present. I expect this option would require less patching of other
> > packages *by far* than any other approach.
> 
> This is not a viable option because build containers lack /bin/sh.
> 

Right, this option would depend on making /bin/sh exist in the build
environment.

I'd hoped this might be possible without having to change the daemon, but the
ways I've tried so far haven't worked. I tried `(mkdir-p "/bin")`, but the
build user apparently doesn't have sufficient permissions. Then I tried
creating a nested container using `call-with-container` in which I could
bind-mound the directory from 'bash-static' at '/bin', but I hit permissions
errors that way, too. I also thought there might be a way to pass the daemon
options like 'build-chroot-dirs' to have it set up /bin/sh before it drops
privileges, but I couldn't figure out how to do that.

> Overall, I think the current situation is a reasonable tradeoff.  It
> forces us to do some patching, indeed, but I think that’s acceptable:
> we’re talking about a handful of packages.
> 
> WDYT?
> 
> Ludo’.

The patching itself isn't so bad, and, as you say, it's limited to at least
a relatively small number of packages. However, the fact that Glibc retains a
reference to 'bash-static' affects nearly every package. It doesn't affect them
very much, to be sure! But I think it does prevent using
`guix shell --container` to create containers without a shell, and it likewise
seems difficult to experiment with different shells. Or maybe it's really just
that it disturbs my sense of aesthetics.

-Philip

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-10-15 23:23   ` Philip McGrath
@ 2022-10-16  7:04     ` Liliana Marie Prikler
  2022-10-16  7:56       ` Philip McGrath
  2022-10-19 15:30     ` Ludovic Courtès
  1 sibling, 1 reply; 14+ messages in thread
From: Liliana Marie Prikler @ 2022-10-16  7:04 UTC (permalink / raw)
  To: Philip McGrath, Ludovic Courtès; +Cc: guix, Maxime Devos

Am Samstag, dem 15.10.2022 um 19:23 -0400 schrieb Philip McGrath:
> On Saturday, October 1, 2022 12:54:27 PM EDT Ludovic Courtès wrote:
> > Hello!
> > 
> > Philip McGrath <philip@philipmcgrath.com> skribis:
> > > 1) If we want to continue to hard-code a specific shell into
> > > Glibc, I
> > > think we should document the decision (for example, why 'bash-
> > > static' vs.
> > > 'bash- minimal'?) […]
> > 
> > The choice of ‘bash-static’ rather than ‘bash-minimal’ is motivated
> > by
> > the fact that, in (gnu packages commencement), we want to make sure
> > ‘glibc-final’ does not retain references to its build-time
> > environment.
> > See #:allowed-references in ‘glibc-final’.
> > 
> 
> This makes sense as far as using 'bash-static' in Glibc. The aspects
> I'm unsure of are:
> 
>  1. If I'm packaging software that implements a function like
>     'system' (e.g. Racket, SML/NJ, Chez Scheme, etc.), should I use
>     'bash-minimal' or 'bash-static'?
> 
>  2. Do we really need 'bash-minimal' at all? Why not just replace it
>     with 'bash-static'?
We already explained those two to you. Racket, SML/NJ, Chez Scheme et
al. are not bootstrap-relevant, thus they can use bash-minimal.  Unlike
bash-static, bash-minimal can be grafted, i.e. a security bug in bash(-
minimal) that necessitates a version bump or similar does not cause a
world rebuild.  A security bug in bash-static does.

> In particular, AFAICT, 'bash-minimal' currently has a reference to
> 'bash-static' via Glibc:
> 
> --8<---------------cut here---------------start------------->8---
> $ guix size bash-minimal 
> store item                                                      
> total    self
> /gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33             
> 38.3    36.6  50.4%
> /gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib         
> 71.7    33.4  45.9%
> /gnu/store/720rj90bch716isd8z7lcwrnvz28ap4y-bash-static-5.1.8       
> 1.7     1.7   2.3%
> /gnu/store/chfwin3a4qp1znnpsjbmydr2jbzk0d6y-bash-minimal-5.1.8     
> 72.7     1.0   1.4%
> total: 72.7 MiB
> --8<---------------cut here---------------end--------------->8---
Everything has a reference to bash-static.  That doesn't mean the
static bash is ever invoked.

> > > 2) If we want to make 'sh' a weak/dynamic reference, I think we
> > > should strongly consider arranging to make it available at
> > > '/bin/sh' when present. I expect this option would require less
> > > patching of other packages *by far* than any other approach.
> > 
> > This is not a viable option because build containers lack /bin/sh.
> > 
> 
> Right, this option would depend on making /bin/sh exist in the build
> environment.
> 
> I'd hoped this might be possible without having to change the daemon,
> but the ways I've tried so far haven't worked. I tried `(mkdir-p
> "/bin")`, but the build user apparently doesn't have sufficient
> permissions. Then I tried creating a nested container using `call-
> with-container` in which I could bind-mound the directory from 'bash-
> static' at '/bin', but I hit permissions errors that way, too. I also
> thought there might be a way to pass the options like 'build-chroot-
> dirs' to have it set up /bin/sh before it
> drops privileges, but I couldn't figure out how to do that.
> 
> > Overall, I think the current situation is a reasonable tradeoff. 
> > It forces us to do some patching, indeed, but I think that’s
> > acceptable: we’re talking about a handful of packages.
> > 
> > WDYT?
> > 
> > Ludo’.
> 
> The patching itself isn't so bad, and, as you say, it's limited to at
> least a relatively small number of packages. However, the fact that
> Glibc retains a reference to 'bash-static' affects nearly every
> package. It doesn't affect them very much, to be sure! But I think it
> does prevent using `guix shell --container` to create containers
> without a shell, and it likewise seems difficult to experiment with
> different shells. Or maybe it's really just that it disturbs my sense
> of aesthetics.
Functionality beats aesthetics.

Cheers


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-10-16  7:04     ` Liliana Marie Prikler
@ 2022-10-16  7:56       ` Philip McGrath
  2022-10-16  8:23         ` Liliana Marie Prikler
  0 siblings, 1 reply; 14+ messages in thread
From: Philip McGrath @ 2022-10-16  7:56 UTC (permalink / raw)
  To: Ludovic Courtès, Liliana Marie Prikler; +Cc: guix, Maxime Devos

[-- Attachment #1: Type: text/plain, Size: 3810 bytes --]

On Sunday, October 16, 2022 3:04:45 AM EDT Liliana Marie Prikler wrote:
> Am Samstag, dem 15.10.2022 um 19:23 -0400 schrieb Philip McGrath:
> > On Saturday, October 1, 2022 12:54:27 PM EDT Ludovic Courtès wrote:
> > > Hello!
> > > 
> > > Philip McGrath <philip@philipmcgrath.com> skribis:
> > > > 1) If we want to continue to hard-code a specific shell into
> > > > Glibc, I
> > > > think we should document the decision (for example, why 'bash-
> > > > static' vs.
> > > > 'bash- minimal'?) […]
> > > 
> > > The choice of ‘bash-static’ rather than ‘bash-minimal’ is motivated
> > > by
> > > the fact that, in (gnu packages commencement), we want to make sure
> > > ‘glibc-final’ does not retain references to its build-time
> > > environment.
> > > See #:allowed-references in ‘glibc-final’.
> > 
> > This makes sense as far as using 'bash-static' in Glibc. The aspects
> > I'm unsure of are:
> > 
> >  1. If I'm packaging software that implements a function like
> > 
> >     'system' (e.g. Racket, SML/NJ, Chez Scheme, etc.), should I use
> >     'bash-minimal' or 'bash-static'?
> > 
> >  2. Do we really need 'bash-minimal' at all? Why not just replace it
> > 
> >     with 'bash-static'?
> 
> We already explained those two to you. Racket, SML/NJ, Chez Scheme et
> al. are not bootstrap-relevant, thus they can use bash-minimal.  Unlike
> bash-static, bash-minimal can be grafted, i.e. a security bug in bash(-
> minimal) that necessitates a version bump or similar does not cause a
> world rebuild.  A security bug in bash-static does.
> 

I don't think I understand this. Does it mean that, in the following, I am 
running a Bash that wouldn't have security bugs fixed? If so, that seems quite 
bad!

--8<---------------cut here---------------start------------->8---
philip@bastet:/tmp$ cat run-bshell.scm 
(use-modules
 (guix build-system gnu)
 (guix gexp)
 ((guix licenses) #:prefix license:)
 (guix packages))
(define src
  (plain-file "run-bshell.c"
              "
#include <stdlib.h>
#include <unistd.h>
#include <paths.h>
int main(void)
{
  execl(_PATH_BSHELL,
        _PATH_BSHELL,
        \"-c\",
        \"echo \" _PATH_BSHELL,
        (char *)NULL);
}
"))
(package
  (name "run-bshell")
  (version "0")
  (source src)
  (build-system gnu-build-system)
  (arguments
   (list
    #:phases
    #~(modify-phases %standard-phases
        (delete 'configure)
        (replace 'build
          (lambda args
            (invoke "gcc" "-o" "run-bshell" #$src)))
        (delete 'check)
        (replace 'install
          (lambda args
            (install-file "run-bshell" (string-append #$output "/bin")))))))
  (home-page #f)
  (synopsis #f)
  (description #f)
  (license license:cc0))
philip@bastet:/tmp$ guix shell --rebuild-cache --container --no-cwd -f run-
bshell.scm -- run-bshell
substitute: updating substitutes from 'https://bordeaux-us-east-mirror.cbaines.nsubstitute: updating substitutes from 'https://bordeaux-us-east-mirror.cbaines.net'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 
100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
The following derivations will be built:
  /gnu/store/q5bib9dgaxzag29a2l4b833mm5l12dx3-profile.drv
  /gnu/store/r45khn1mq17fc0xsab1yszii85ynsm2j-run-bshell-0.drv

building /gnu/store/r45khn1mq17fc0xsab1yszii85ynsm2j-run-bshell-0.drv...
building CA certificate bundle...
listing Emacs sub-directories...
building fonts directory...
building directory of Info manuals...
building profile with 1 package...
/gnu/store/720rj90bch716isd8z7lcwrnvz28ap4y-bash-static-5.1.8/bin/sh
philip@bastet:/tmp$ 
--8<---------------cut here---------------end--------------->8---

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-10-16  7:56       ` Philip McGrath
@ 2022-10-16  8:23         ` Liliana Marie Prikler
  0 siblings, 0 replies; 14+ messages in thread
From: Liliana Marie Prikler @ 2022-10-16  8:23 UTC (permalink / raw)
  To: Philip McGrath, Ludovic Courtès; +Cc: guix, Maxime Devos

Am Sonntag, dem 16.10.2022 um 03:56 -0400 schrieb Philip McGrath:
> I don't think I understand this. Does it mean that, in the following,
> I am running a Bash that wouldn't have security bugs fixed? If so,
> that seems quite bad!
You would, but note that in order to exploit this, you would have to
exploit glibc – which can be grafted and could also be built against a
fixed bash.  That is, we'd first have to define bash-static-fixed and
then glibc-fixed whose bash-static input is replaced with bash-static-
fixed.  Note that this makes sense for a single package, but obviously
doesn't scale well.

Cheers


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What 'sh' should 'system' use?
  2022-10-15 23:23   ` Philip McGrath
  2022-10-16  7:04     ` Liliana Marie Prikler
@ 2022-10-19 15:30     ` Ludovic Courtès
  1 sibling, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2022-10-19 15:30 UTC (permalink / raw)
  To: Philip McGrath
  Cc: guix, Maxime Devos, Liliana Marie Prikler, Liliana Marie Prikler

Hi,

Philip McGrath <philip@philipmcgrath.com> skribis:

> This makes sense as far as using 'bash-static' in Glibc. The aspects I'm unsure
> of are:
>
>  1. If I'm packaging software that implements a function like 'system'
>     (e.g. Racket, SML/NJ, Chez Scheme, etc.), should I use 'bash-minimal' or
>     'bash-static'?
>
>  2. Do we really need 'bash-minimal' at all? Why not just replace it with
>     'bash-static'?

Rather ‘bash-minimal’ (so it can benefit from grafts), but in practice
it doesn’t matter all that much.

> In particular, AFAICT, 'bash-minimal' currently has a reference to
> 'bash-static' via Glibc:
>
> $ guix size bash-minimal 
> store item                                                       total    self
> /gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33              38.3    36.6  50.4%
> /gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib          71.7    33.4  45.9%
> /gnu/store/720rj90bch716isd8z7lcwrnvz28ap4y-bash-static-5.1.8        1.7     1.7   2.3%
> /gnu/store/chfwin3a4qp1znnpsjbmydr2jbzk0d6y-bash-minimal-5.1.8      72.7     1.0   1.4%
> total: 72.7 MiB

True; at least it’s a small part of the closure size.

>> > 2) If we want to make 'sh' a weak/dynamic reference, I think we should
>> > strongly consider arranging to make it available at '/bin/sh' when
>> > present. I expect this option would require less patching of other
>> > packages *by far* than any other approach.
>> 
>> This is not a viable option because build containers lack /bin/sh.
>> 
>
> Right, this option would depend on making /bin/sh exist in the build
> environment.

And this won’t happen: it’s important that the contents of build
environments be stable over time.  That is the foundation of
reproducible builds in Guix.

> I'd hoped this might be possible without having to change the daemon, but the
> ways I've tried so far haven't worked. I tried `(mkdir-p "/bin")`, but the
> build user apparently doesn't have sufficient permissions. Then I tried
> creating a nested container using `call-with-container` in which I could
> bind-mound the directory from 'bash-static' at '/bin', but I hit permissions
> errors that way, too. I also thought there might be a way to pass the daemon
> options like 'build-chroot-dirs' to have it set up /bin/sh before it drops
> privileges, but I couldn't figure out how to do that.

For the record, /bin/sh was dropped on purpose early on in Guix (unlike
in Nix) to have a limited and well-defined build environment:

  https://lists.gnu.org/archive/html/bug-guix/2013-01/msg00041.html

> The patching itself isn't so bad, and, as you say, it's limited to at least
> a relatively small number of packages. However, the fact that Glibc retains a
> reference to 'bash-static' affects nearly every package. It doesn't affect them
> very much, to be sure! But I think it does prevent using
> `guix shell --container` to create containers without a shell, and it likewise
> seems difficult to experiment with different shells. Or maybe it's really just
> that it disturbs my sense of aesthetics.

The choice was made for ‘guix shell -C’ to always provide /bin/sh but
purely for convenience (I can’t find the reference); it’s not strictly
necessary.

Now, it’s true that currently ‘guix shell -C’ always uses Bash as
/bin/sh.  Are you saying that it would be nice to be able to use a
different shell?  In ‘guix home container’, we made a different, which
is to honor $SHELL (see ‘user-shell’ in guix/scripts/home.scm).

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-19 15:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-19  0:13 What 'sh' should 'system' use? Philip McGrath
2022-09-19  7:07 ` Liliana Marie Prikler
2022-09-26  8:07   ` Philip McGrath
2022-09-26 10:04     ` Liliana Marie Prikler
2022-09-19 12:55 ` Maxime Devos
2022-09-26  7:04   ` Philip McGrath
2022-09-26  9:41     ` Liliana Marie Prikler
2022-09-26 12:24     ` Maxime Devos
2022-10-01 16:54 ` Ludovic Courtès
2022-10-15 23:23   ` Philip McGrath
2022-10-16  7:04     ` Liliana Marie Prikler
2022-10-16  7:56       ` Philip McGrath
2022-10-16  8:23         ` Liliana Marie Prikler
2022-10-19 15:30     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).