Re: master 8dcb19f 4/4: Add a unit test testing interaction between threads and processes.

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Philipp <p.stephani2@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Philipp Stephani <phst@google.com>, emacs-devel@gnu.org
Subject: Re: master 8dcb19f 4/4: Add a unit test testing interaction between threads and processes.
Date: Thu, 18 Mar 2021 13:51:10 +0100	[thread overview]
Message-ID: <25EEFE6F-9532-4A5A-8824-A3F8F41012F3@gmail.com> (raw)
In-Reply-To: <83mtvo7xno.fsf@gnu.org>

> Am 28.02.2021 um 21:19 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
>> From: Philipp <p.stephani2@gmail.com>
>> Date: Sun, 28 Feb 2021 19:30:46 +0100
>> Cc: Philipp Stephani <phst@google.com>,
>> emacs-devel@gnu.org
>> 
>> The specific scenario here is: Start a number of process and a number of threads, each of which waits for one of the processes
> 
> See below: I don't see how you ensure in this test that each thread
> waits for just one process.

Shouldn't providing a process argument to accept-process-output ensure this, modulo implementation details that are not observable?  The only nontrivial guarantee that the test requires is that after (accept-process-output PROC) returns nil, the PROC process has finished, independent of other processes or threads, and that multiple concurrent calls of this form don't deadlock.

> 
>> then, cause the processes to exit (since they run „cat“, that works by sending EOF); then, join all the threads.  This should work without hangs or errors.
> 
> There are two "inputs" from each subprocess that Emacs receives: the
> self-pipe output and the SIGCHLD signal.  Depending on the timing and
> on which thread receives which input, the behavior can be different.
> Do you have a clear idea what should happen and what does happen wrt
> these "inputs"?  In particular, which thread(s) do you expect to see
> SIGCHLD and/or read the self-pipe as result of that, and do you think
> it's important whether a thread gets awoken by the signal or by
> reading the self-pipe?

All of those are implementation details that should be entirely invisible to the thread interface.  As such, none of the threads should "see" any of those.  All that matters is the observable behavior of accept-process-output: it may return nil only if and when its process argument has finished, independent of threads or signal disposition.

> 
>>>> The test wouldn't work without it, because it calls
>>>> accept-process-output from a thread different from the thread in which
>>>> the process was created.
>>> 
>>> But set-process-thread accepts a thread argument, so you could bind
>>> the process to a particular thread, couldn't you?
>> 
>> Yes, but the threads that call `accept-process-output’ and `process-send-eof’ are different threads; binding the process object to one of the threads would cause the other call to fail.
> 
> As Stefan points out, process-send-eof could be sent from any thread.

Correct, though this shouldn't make a difference: the test should pass whether the process objects have thread affinity or not (and that's the case for me).

> 
>>> More generally, my reading of the code is that the first thread which
>>> calls accept-process-output will start listening on all the file
>>> descriptors of all the sub-processes which were started until then,
>>> and all the other threads will have no descriptors left to listen on.
>>> Is this the situation you intended to create?
>> 
>> No, but I’m not convinced that that’s actually how the implementation behaves.  Wouldn’t with the current implementation each thread wait for the file descriptors of one process?
> 
> How would that happen, in this particular test?  The test first starts
> 10 processes, then creates 10 threads, then calls thread-join for each
> of the threads.  The threads that the test create don't run, because
> the first thing each thread does is attempt to take the global lock,
> and becomes blocked because the main thread still runs.  The main
> thread yields the first time it calls thread-join; at that time, one
> of the 10 waiting threads (not necessarily the one passed as argument
> to thread-join) will take the global lock.  This thread will call
> accept-process-output, which calls wait_reading_process_output, which
> computes the mask for pselect to wait on.  Now look at the code in
> compute_non_keyboard_wait_mask and its ilk: the mask will include
> every descriptor that is not already watched by some other thread.
> Since this is the first thread that calls wait_reading_process_output,
> it will find all the descriptors up for grabs, and will add all of
> them to the mask.  Then it will release the global lock and call
> pselect.  Releasing the global lock will let some other thread of the
> 9 remaining threads run, that thread will also call
> accept-process-output, which will call wait_reading_process_output --
> but now all the descriptors are taken by the first thread, so there's
> nothing left for the second one, and the same for all the other 8
> threads.
> 
> So what we have now is one thread waiting for output from all the 10
> subprocess, and the other 9 waiting only on the self-pipe.  What
> happens then, when the processes exit and we get 10 SIGCHLD signals,
> is left as an exercise.  And don't forget that order in which we
> receive the signal and the byte from self-pipe is not entirely
> deterministic, either.
> 
> Am I missing something, or is this what happens the way the test is
> written?  And if I'm not missing anything, is this the situation you
> wanted to create and test?

This might be how the implementation behaves (I haven't checked, but I have no reason to doubt your explanation).  However, the test doesn't test the implementation, but the observable behavior.  That is, I can't formally prove that the implementation leads to the observable behavior that the test expects, but I think it should, and I'd consider it a bug (in Emacs, not the test code) if it doesn't.

> 
>>> Why is it important to test this situation, and what do we expect to
>>> happen in it?  Is it reasonable to have more than one thread wait for
>>> output from a subprocess?
>> 
>> The tests shouldn’t ask what’s reasonable, but what’s documented or expected.  Surely users can write code like this, and I don’t see why that shouldn’t be supported.
> 
> I didn't say anything about unsupporting this situation, I asked what
> should happen in it, and I meant the details, not merely that there
> are no errors.  E.g., is it important that each threads gets output
> from exactly one subprocess?

How does a thread "get output"?  Do you mean that accept-process-output returns non-nil?  What I'd expect from accept-process-output with a process argument is that directly after it returns nil, the process represented by the process argument has finished, all output has been passed to buffers/filters, and the sentinel has run.  That's independent of the thread in which accept-process-output is called.

> is it important whether a thread exits
> because it read output or because it received SIGCHLD and read the
> self-pipe?

No, these are implementation details: they might as well not exist as far as the observable behavior of accept-process-output is concerned.

> is it important whether all the threads exit one after
> another, or is it okay if they all wait until one thread exits and
> then exit immediately (because all the processes have already died)?

I don't think that's observable/documented behavior: we only know that a thread has exited after thread-join returns.

> etc.
> 
> I instrumented the code that gets executed by this test and saw all
> kinds of strange phenomena, and I'm asking myself whether you looked
> at those details and whether you are satisfied by what you saw.

No, that's not the way I approach topics like this.  I wrote the test based on my reading of the documentation (having in mind the complexity of the implementation, but not being guided by it).  I then ran the test and it passed, which gave me a bit of confidence in the correctness of the implementation (which the test can't prove, of course).
So the questions I'd ask here: Is the test code correct, based on documented (or at least implied) guarantees?  If it's correct, but the test fails on some system, the implementation is buggy.  What specific sequence of events in the implementation code leads to the buggy behavior is then only the next step, not the starting point of the discussion.

> 
>>> The question I ask myself is what to do with the deviant behavior on
>>> MS-Windows.  If it behaves differently because what we test here are
>>> the peculiarities of the semi-broken Unix mechanism known as "signals"
>>> and/or some details of our implementation of subprocesses on Posix
>>> platforms, then maybe there's no need to change anything in the
>>> Windows code, and just either skip this test on MS-Windows or write a
>>> different test which will work there.  IOW, in such a case there's no
>>> bug in the Windows port, it's just that it isn't bug-for-bug
>>> compatible with Unix.
>> 
>> I don’t think so.  Again, I wrote the test based on my understanding of the expected and promised behavior of the process and thread functions, not the Posix implementation.
> 
> Where and how did you obtain that understanding?  The documentation is
> (intentionally or unintentionally) very vague regarding these details.

Yes, so I had to fill in some aspects that I hope are reasonable.  In particular, the documentation doesn't state explicitly that waiting for a process in a different thread than the original one is supported, but given set-process-thread it implicitly has to be, because what would otherwise be the purpose of set-process-thread?
The documentation definitely needs to be significantly expanded, as this discussion shows.
So maybe asking more directly:
- Should this test pass?
- If so, how can this be deduced from the documentation?
- If not, how can *that* be deduced from the documentation, and what changes would be necessary to guarantee that it passes?
- Is there anything missing from the documentation that prevents deducing whether the test should pass or not?

> Perhaps you only cared about whether the code should or shouldn't err,
> but that's clearly not the only interesting or important aspect of
> this exercise, is it?

It's at least one important aspect, but finding deadlocks (hangs) is probably more important.

> 
>>> But if what happens in this test is important in practical use cases,
>>> then perhaps our Windows code does need some changes in this area.
>>> 
>>> This is why I'm trying to understand the semantics of this test -- I'd
>>> like to figure out which implementation details are important, and
>>> thus should be emulated on Windows, and which are just details that
>>> can be different (and portable Lisp programs will have to take those
>>> differences into account).
>> 
>> The way I see it, all aspects of the test are important — their behavior should be guaranteed by the implementation.
> 
> Which aspects of those I mentioned above do you think the
> implementation should guarantee?  E.g., should it guarantee how many
> threads get the actual output from the processes in this case,

That depends what "get the actual output" means.  What I expect is the behavior of accept-process-output, as explained above. Specifically: accept-process-output will return nil quickly (within milliseconds, assuming the calling thread is unblocked) once the given process has exited.  Once it returns, all process output has been given to process filters/buffers, and the sentinel has run.

> and how
> many get the signal?

None, as the signal is an implementation detail and not part of the interface.

> 
>>> For example, the test verifies that each process exited with zero
>>> status, but it doesn't care which thread detected that -- is this
>>> important?
>> 
>> IIUC threads don’t really „detect“ process exit.
> 
> ??? When a subprocess exits, Emacs gets a SIGCHLD, and then we access
> the exit code in order to pass it to the sentinel function.  This is
> what I call "detecting process exit".

Sure, but is that really related to threads?  I'd rather say that *Emacs* detects process exits.
It's also true that there's some thread that contains the `accept' call, but that's again an implementation detail and not related to Lisp threads.  Several other implementations would be possible: Calling `accept' in a single thread not accessible from Lisp at all; implementing Lisp threads as "green threads" without pthread support; etc.

> 
>> What I’d assume is that the `thread-join’ calls are synchronization points — after calling `thread-join’ the thread in question is guaranteed to have finished, and since the thread has been waiting for a process to finish (using the accept-process-output loop), the process is also guaranteed to have finished, and this sequence of event is guaranteed exactly in that order.
> 
> I don't see how you reach those conclusions.  The first part is true:
> when thread-join exits, the thread it was waiting for is guaranteed to
> have exited.  But how it exited, and what caused it to exit -- these
> are open questions, in a situation where 10 subprocesses generate
> output and produce 10 signals.

There's only one way for a thread to exit: when its thread function returns.  At least that's what the manual says.  And the thread function can only return when `accept-process-output' returns nil, so we're back at the discussion what the semantics of `accept-process-output' are.

> 
>>> Or what about delivery of SIGCHLD -- is this important
>>> which thread gets delivered the signal, or how many threads should the
>>> signal wake up?
>> 
>> Not as long as the observable behavior stays the same (and isn’t buggy).
> 
> And the "observable behavior" is what? that processes exited
> successfully and no Lisp errors happened?  Or is there something else
> we expect in this case?

That the code as a whole doesn't hang/deadlock.  It's harder to make precise guarantees for that, but the 60s timeout should be long enough for the test to finish even on a heavily-loaded system.

> 
>>> Or is it important when and due to what reason(s)
>>> does each thread exit?
>> 
>> It’s important that each thread exits normally, i.e., due to its thread function body exiting normally.
> 
> So it is NOT important what event caused the thread function to run,
> and which event caused it to exit accept-process-output and finish?

No.  AFAIK these are unobservable, and the manual doesn't make any guarantees in this respect.

> 
>> the observable order of events matters: first process-send-eof, then process exit, then thread exit, then thread-join returns.
> 
> The order of the first two is guaranteed because a single thread
> arranges for that sequentially.  But do you care if the sequence is
> like this:
> 
>  . all the 10 processes exit
>  . one of the threads exit because the processes exited
>  . the other 9 threads get awoken one by one, exit
>    accept-process-output because of SIGCHLD, end exit

From what I can see, that should be fine.  The test only checks that the threads finish eventually and don't deadlock, but doesn't care about the order of events across threads (which is not guaranteed unless there are explicit synchronization points like mutexes or condition variables).

next prev parent reply	other threads:[~2021-03-18 12:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-23 19:49 master 8dcb19f 4/4: Add a unit test testing interaction between threads and processes Eli Zaretskii
2021-01-23 20:27 ` Eli Zaretskii
2021-01-23 20:33   ` Philipp Stephani
2021-01-24  5:26     ` Eli Zaretskii
2021-01-24 13:22       ` Philipp Stephani
2021-01-24 15:30         ` Eli Zaretskii
2021-02-28 18:30           ` Philipp
2021-02-28 18:36             ` Stefan Monnier
2021-02-28 18:52               ` Philipp
2021-02-28 20:19             ` Eli Zaretskii
2021-03-18 12:51               ` Philipp [this message]
2021-03-18 15:39                 ` Eli Zaretskii
2021-03-21 12:00                   ` Philipp
2021-03-21 12:23                     ` Eli Zaretskii
2021-01-23 20:34 ` Philipp Stephani
2021-01-24  3:36   ` Eli Zaretskii
2021-01-24 14:43     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25EEFE6F-9532-4A5A-8824-A3F8F41012F3@gmail.com \
    --to=p.stephani2@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=phst@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).