Eli Zaretskii <eliz@gnu.org> schrieb am Sa., 10. Juni 2017 um 14:38 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 10 Jun 2017 11:38:32 +0000
> > Cc: emacs-devel@gnu.org, phst@google.com
> >
> > Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 26. Apr. 2017 um 07:32 Uhr:
>
> [It's very hard to have a discussion with 1.5 months between messages.]
>

Yes, sorry for the huge delay, I wanted to follow up here for a while :-/


>
> > > >  > - Using objects across threads requires careful synchronization.
> > > >
> > > >  Not sure what synchronization you have in mind. There's only one
> > > >  running thread at any given time, and a thread cannot be preempted
> > > >  while it runs Lisp, so synchronization seems unnecessary, as a
> thread
> > > >  cannot modify a Lisp object that another thread is modifying.
> > > >
> > > > However, that's just an implementation detail, and the mapping
> between
> > > Emacs and OS threads is
> > > > unspecified (for good reason).
> > >
> > > It isn't unspecified: the ELisp manual explicitly documents that only
> > > one Lisp thread can run at any given time.
> >
> > That is true for a *Lisp* thread, not for an *OS* thread. On the OS
> thread
> > level we might theoretically have multiple threads per Lisp thread, or
> miss
> > some synchronization barriers, etc.
>
> You were AFAIU talking about accesses to Lisp objects, and these are
> only possible via Lisp, which can only be run in a single thread at a
> time.  So if the non-Lisp threads of any kind can enter this picture,
> in a way that module functions will be run by those threads and access
> Lisp objects, please describe such a use case, so we all are on the
> same page regarding the situations we are considering.
>

Such a use case is what these assertions should guard against. Calling
environment functions from non-Lisp threads is undefined behavior, and
these assertions are meant to protect module developers against them.


>
> > > > To be forward-compatible, either we have to formally document this
> and
> > > then
> > > > never ever change the implementation, or document that module authors
> > > can't rely on it and have to perform
> > > > synchronization themselves.
> > >
> > > Since it is documented already, module authors can rely on that.
> >
> > It's not documented because it can't be documented without talking about
> > the behavior of the C module API, and that is itself not documented yet.
>
> If what you mean is that we don't say explicitly in the doc that only
> one Lisp thread can be active at any given time, we can add that
> statement right now.  Would that be enough to settle this particular
> argument.
>

There is no actual argument :-)
It's just a matter of how much documentation we need. I think the
guaranteed synchronization should be documented in the to-be-written
modules manual as well, if we choose to make the guarantee.


>
> > If
> > we choose to guarantee that unsynchronized accesses to module functions
> > don't introduce data races, then we need to say that explicitly in the
> > module documentation, and stick to it. (I.e. once that decision has been
> > made, there's no going back.)
>
> I think that ship sailed long ago: the current implementation assumes
> implicitly and explicitly that at most only one Lisp thread runs at
> any given time.  If we will ever want to have an implementation that
> violates this constraint, we will have to completely rewrite
> everything that is related to threads, including emacs-modules.c.
>

Yes, but if we do decide to rewrite, then the new code might need
additional synchronization if we guarantee thread safety which would
otherwise not be required.


>
> > > > I meant this as a general statement: if an API is more restrictive
> (i.e.
> > > has stronger preconditions), its
> > > > implementation can be more flexible because it has to deal with fewer
> > > corner cases.
> > >
> > > I don't think it's right for us to favor our flexibility over that of
> > > the Lisp programmers who will want to use these features.
> >
> > That's a matter for debate.
>
> I'm saying that as long as this is debatable, we shouldn't make life
> for Lisp programmers so much harder based on assumptions that are not
> universally approved by the project as a whole.  My opinion is that
> the Lisp programmers' flexibility should be preferable to that of
> developers.
>

The two options are:
1. Guarantee thread safety of module objects.
2. Don't guarantee thread safety of module objects.
You clearly advocate for (1), which is fine. What I'm trying to point out
is that if we picked (2) now, we can easily switch to (1) later, but not
the other way round. Once we decide to do (1), there's no way to go to (2),
even if in some future implementation (2) might become simpler or more
efficient. That's the tradeoff.


>
> > > I think we only care that module functions run in the context of some
> > > Lisp thread, and we don't care which Lisp thread is that.  So I
> > > proposed a simplified implementation of the test, which I hope you
> > > will agree with.
> >
> > If we want to guarantee that environment functions can be called from
> > arbitrary threads without introducing data races, then the only assertion
> > that's necessary is whether (current_thread->thread_id ==
> > GetCurrentThreadId ()); without that undefined behavior would be almost
> > guaranteed.
>
> Sorry, I don't understand what you are saying here.  Especially since
> your proposed condition is almost the one I proposed in
>
>   http://lists.gnu.org/archive/html/emacs-devel/2017-04/msg00720.html
>
> Does this mean that you agree with my reasoning and the code I
> proposed there?
>

Yes, almost, if we pick option (1) now.


>
> Once again: there _are_ legitimate situations in Emacs when for a
> short time current_thread is NULL.  If you assume that these
> situations don't happen, your code will sooner or later crash and
> burn.  I'm saying this because I once thought current_thread should be
> non-NULL at all times, and my code which assumed that did crash.
>

I've reviewed the threads code, and I'm quite sure that current_thread can
never be NULL during check_thread. current_thread is only NULL between
exiting a Lisp thread and resuming execution in another thread after
reacquiring the GIL. The evaluator doesn't run during that phase, and
therefore no module functions can run.
current_thread could only be NULL during check_thread if called from a
thread that is not a Lisp thread (i.e. an OS thread created by the module).
That's exactly one of the undefined behavior situations that the assertions
are meant to prevent.