Ludovic Courtès <ludo@gnu.org> writes:

>  Previously, services started indirectly with ‘exec-command’ (which is usually
>  the case) would not inherit any file descriptor from shepherd because
>  ‘exec-command’ would explicitly close all of them.  However, services started
>  with ‘make-system-constructor’ and processes created by some other means, such
>  as calling ‘system*’, would inherit some of those descriptors, giving them
>  more authority than intended.

Interestingly enough, guile's system* closes all file descriptors aside
from the first 3, and we now provide a replacement for system* that uses
fork+exec-command and therefore does not.

> The FD-closing loop was removed on purpose, in
> 2c0354258047133db8b885bcc11afdf0def5d885.
>
> Now, as you write, it means that service writers must be careful now and
> not leave any non-CLOEXEC file descriptor behind them.
>
> At the time I audited Guix System to check that this was a reasonable
> thing to expect and that we could indeed ensure no file descriptors were
> leaked.  There’s also ‘tests/close-on-exec.sh’.
>
> If you found cases where it would be necessary, what we could do is have
> ‘shepherd’ replace ‘call-with-input-file’ & co. with a variant that
> opens files as O_CLOEXEC by default.  WDYT?
The problem is that there isn't a way for the user to replicate the old
behavior of default-non-inheriting short of wrapping the desired command
with another file-descriptor-closing program, by which point the user
and group may already have been changed.

O_CLOEXEC is good, but to use it to prevent leaks to any particular
program is a matter of global correctness, while closing everything not
explicitly allowed for a program is a matter of local correctness.  For
example, if I have a program whose quality (and, to some extent,
trustworthiness) I am wary of, I would really prefer to be certain that
it doesn't get any file descriptors it shouldn't.  The only way to be
certain of this currently is to examine the entire shepherd process -
including all services - to be sure there aren't any leaks.

Replacing 'call-with-input-file' and such - as I now see is done in
guix's shepherd config file - would probably be a good idea (I see we
use call-with-input-file in primitive-load* and read-pid-file, for
example), but it's still no guarantee.

For example, I didn't even know SOCK_CLOEXEC was a thing, and so didn't
use it in one of my services that calls socketpair.  I did use fcntl
afterward, but the time between those two introduces all kinds of
questions, like "have you called system, system*, spawn-command, sleep,
done any I/O, or had the misfortune to receive a SIGCHLD on a system
without signalfd support".

And if I hadn't happened to have looked at the source of exec-command, I
wouldn't have even tried fcntl, because the manual entry for
fork+exec-command and exec-command still clearly states:

  File descriptors 1 and 2 are kept as is or redirected to either
  LOG-PORT or LOG-FILE if it’s true, whereas file descriptor 0 (standard
  input) points to INPUT-PORT or ‘/dev/null’; all other file descriptors
  are closed prior to yielding control to COMMAND.

Whatever course of action is taken, it is imperative that the
documentation and code be aligned on this matter.  Personally I'm
inclined to bring the code in line with the documentation.

>>    In other words, the copying of earlier FDs in EXTRA-PORTS may overwrite
>>    later FDs in EXTRA-PORTS.
>
> Good catch!
>
> Could you make a more minimal patch fixing this specific issue, also
> adding a test reproducing the problem being fixed?

I'm sending a more granular v2 series that adds the requested test,
fixes the clobbering, makes extra-ports be honored regardless of the
values of log-port and log-file, fixes edge cases around file
descriptors 0-2, and finally adds support for closing all other file
descriptors to exec-command.  That last one also makes closing all other
file descriptors the default, mostly because it makes the most sense for
the default value of extra-ports, and otherwise I'd have to update the
manual.  It does include the option of #:extra-ports #t, though, in
which case no fds are closed.

One alternative would be to have the close-all-others behavior
controlled by an independent argument; that way the file descriptor
rearranging functionality can be independent of whether other file
descriptors are closed.

Speaking of file descriptor rearranging functionality, the v2 replaces
'preserve-ports' with 'reconfigure-fds', which uses a different,
graph-based approach to avoiding clobbering while using fewer system
calls.  It's also more robust, and will function as long as there is
even a single unused file descriptor, and won't touch any open file
descriptors other than those in the target range.

- ulfvonbelow