unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* SCM_SYSCALL
@ 2013-06-23 21:25 Ludovic Courtès
  2013-07-03 18:19 ` SCM_SYSCALL Mark H Weaver
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ludovic Courtès @ 2013-06-23 21:25 UTC (permalink / raw)
  To: guile-devel

Hello Guilers!

We have this (since 2010):

--8<---------------cut here---------------start------------->8---
#   define SCM_SYSCALL(line)                    \
  do                                            \
    {                                           \
      errno = 0;                                \
      line;                                     \
      if (errno == EINTR)                       \
        {                                       \
          SCM_ASYNC_TICK;                       \
          continue;                             \
        }                                       \
    }                                           \
  while(0)
--8<---------------cut here---------------end--------------->8---

It turns out that the effect upon EINTR is to leave the loop.  So
typically, fport_fill_input just throws to system-error and reveals the
EINTR, contrary to SCM_SYSCALL intends to do.

This is easily fixed, but the question is whether this would affect
users in bad ways.  For example, applications might be relying on the
ability to do

  (catch 'system-error
    ...
    (lambda args
      (if (= EINTR (system-error-errno args))
          ...)))

Should the fix be delayed until 2.2?
WDYT?

Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-06-23 21:25 SCM_SYSCALL Ludovic Courtès
@ 2013-07-03 18:19 ` Mark H Weaver
  2013-07-04 22:28   ` SCM_SYSCALL Ludovic Courtès
  2013-07-17 16:04 ` SCM_SYSCALL Ludovic Courtès
  2014-03-23 19:56 ` SCM_SYSCALL Andy Wingo
  2 siblings, 1 reply; 9+ messages in thread
From: Mark H Weaver @ 2013-07-03 18:19 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi Ludovic,

ludo@gnu.org (Ludovic Courtès) writes:

> We have this (since 2010):
>
> #   define SCM_SYSCALL(line)                    \
>   do                                            \
>     {                                           \
>       errno = 0;                                \
>       line;                                     \
>       if (errno == EINTR)                       \
>         {                                       \
>           SCM_ASYNC_TICK;                       \
>           continue;                             \
>         }                                       \
>     }                                           \
>   while(0)
>
> It turns out that the effect upon EINTR is to leave the loop.  So
> typically, fport_fill_input just throws to system-error and reveals the
> EINTR, contrary to SCM_SYSCALL intends to do.

Ugh.  Well, I guess this finally explains <http://bugs.gnu.org/13018>.
Thanks for tracking this down.

> This is easily fixed, but the question is whether this would affect
> users in bad ways.  For example, applications might be relying on the
> ability to do
>
>   (catch 'system-error
>     ...
>     (lambda args
>       (if (= EINTR (system-error-errno args))
>           ...)))
>
> Should the fix be delayed until 2.2?
> WDYT?

I strongly believe that we should fix this in stable-2.0.  While it is
true that the above scenario is possible, I suspect it is at least an
order of magnitude more common for Guile-based software to be written
based on the presumption that EINTR is handled automatically.

Not only did all versions of Guile 1.x automatically handle EINTR, but
most of us assumed that this behavior was unchanged in Guile 2.0 and
wrote our software based on that assumption.  I certainly did.

As it is now, even portable Scheme code that uses (read) might result in
exceptions being thrown semi-randomly.  We cannot reasonably expect
Guile programs to put each (read) within a loop to handle EINTR.

Please, let's fix this in stable-2.0.

What do you think?

    Mark



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-07-03 18:19 ` SCM_SYSCALL Mark H Weaver
@ 2013-07-04 22:28   ` Ludovic Courtès
  2013-07-05 18:56     ` SCM_SYSCALL Mark H Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2013-07-04 22:28 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

Hi Mark,

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> We have this (since 2010):
>>
>> #   define SCM_SYSCALL(line)                    \
>>   do                                            \
>>     {                                           \
>>       errno = 0;                                \
>>       line;                                     \
>>       if (errno == EINTR)                       \
>>         {                                       \
>>           SCM_ASYNC_TICK;                       \
>>           continue;                             \
>>         }                                       \
>>     }                                           \
>>   while(0)
>>
>> It turns out that the effect upon EINTR is to leave the loop.  So
>> typically, fport_fill_input just throws to system-error and reveals the
>> EINTR, contrary to SCM_SYSCALL intends to do.
>
> Ugh.  Well, I guess this finally explains <http://bugs.gnu.org/13018>.

Indeed.  (Funny to see how I was blissfully quoting the above macro
saying: “look, EINTR is handled, of course!”.  :-))

> I strongly believe that we should fix this in stable-2.0.  While it is
> true that the above scenario is possible, I suspect it is at least an
> order of magnitude more common for Guile-based software to be written
> based on the presumption that EINTR is handled automatically.
>
> Not only did all versions of Guile 1.x automatically handle EINTR, but
> most of us assumed that this behavior was unchanged in Guile 2.0 and
> wrote our software based on that assumption.  I certainly did.
>
> As it is now, even portable Scheme code that uses (read) might result in
> exceptions being thrown semi-randomly.  We cannot reasonably expect
> Guile programs to put each (read) within a loop to handle EINTR.
>
> Please, let's fix this in stable-2.0.

Yes, I’ve reached that conclusion too.

I’ve been cooking a patch but the test case ends up being trickier to
write than I expected.  Here’s what I have:

        (let* ((in+out   (pk 'pipe (pipe)))
               (lock     (make-mutex))
               (cond     (make-condition-variable))
               (signaled #f)
               (thread   (call-with-new-thread
                          (lambda ()
                            (with-mutex lock
                              (display "hello " (cdr in+out))
                              (wait-condition-variable cond lock)
                              (display "world\n" (cdr in+out))
                              (close-port (cdr in+out)))))))
          (define handle
            (lambda (signum)
              (with-mutex lock
                (set! signaled (pk 'sig signum))
                (signal-condition-variable cond))))
          (sigaction SIGALRM handle 0)
          (alarm 2)

          ;; This thread (the main thread) receives the signal.  Yet,
          ;; the EINTR returned by read(2) as called via `read-line'
          ;; must be swallowed.
          (let ((line (read-line (car in+out))))
            (join-thread thread)
            (list signaled line)))

This nicely reproduces the problem where fport_fill_input throws to
‘system-error’ with EINTR.

However, with a fixed SCM_SYSCALL, the result is pretty much the same as
with SA_RESTART (see <http://bugs.gnu.org/14640>): when SCM_ASYNC_TICK
is called right after we get EINTR, chances are that the async hasn’t
been queued yet, so we get back to our read(2) call, and thus the
Scheme-level signal handler is never called.  (Typically, when running
the test through strace, it passes, because the timing is “better”, but
it fails without strace.)

Suggestions?

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-07-04 22:28   ` SCM_SYSCALL Ludovic Courtès
@ 2013-07-05 18:56     ` Mark H Weaver
  2013-07-05 20:01       ` SCM_SYSCALL Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Mark H Weaver @ 2013-07-05 18:56 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi Ludovic,

ludo@gnu.org (Ludovic Courtès) writes:
> I’ve been cooking a patch but the test case ends up being trickier to
> write than I expected.  Here’s what I have:
>
>         (let* ((in+out   (pk 'pipe (pipe)))
>                (lock     (make-mutex))
>                (cond     (make-condition-variable))
>                (signaled #f)
>                (thread   (call-with-new-thread
>                           (lambda ()
>                             (with-mutex lock
>                               (display "hello " (cdr in+out))
>                               (wait-condition-variable cond lock)
>                               (display "world\n" (cdr in+out))
>                               (close-port (cdr in+out)))))))
>           (define handle
>             (lambda (signum)
>               (with-mutex lock
>                 (set! signaled (pk 'sig signum))
>                 (signal-condition-variable cond))))
>           (sigaction SIGALRM handle 0)
>           (alarm 2)
>
>           ;; This thread (the main thread) receives the signal.  Yet,
>           ;; the EINTR returned by read(2) as called via `read-line'
>           ;; must be swallowed.
>           (let ((line (read-line (car in+out))))
>             (join-thread thread)
>             (list signaled line)))
>
> This nicely reproduces the problem where fport_fill_input throws to
> ‘system-error’ with EINTR.
>
> However, with a fixed SCM_SYSCALL, the result is pretty much the same as
> with SA_RESTART (see <http://bugs.gnu.org/14640>): when SCM_ASYNC_TICK
> is called right after we get EINTR, chances are that the async hasn’t
> been queued yet, so we get back to our read(2) call, and thus the
> Scheme-level signal handler is never called.  (Typically, when running
> the test through strace, it passes, because the timing is “better”, but
> it fails without strace.)
>
> Suggestions?

Hmm.  Shouldn't our signal handlers be run in a different thread?  Maybe
we can't make this change until 2.2, but it seems to me that there are
very serious problems trying to run signal handlers from within asyncs,
analogous to the problems running finalizers within asyncs.  Commonly,
signal handlers need to mutate some global state, but that cannot in
general be done safely from within asyncs, because asyncs might be
called while the global state is in an inconsistent state, at least for
data structures implemented in Scheme.

What do you think?

      Mark



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-07-05 18:56     ` SCM_SYSCALL Mark H Weaver
@ 2013-07-05 20:01       ` Ludovic Courtès
  2013-07-06 16:41         ` SCM_SYSCALL Mark H Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2013-07-05 20:01 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

Hi Mark!

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>> I’ve been cooking a patch but the test case ends up being trickier to
>> write than I expected.  Here’s what I have:
>>
>>         (let* ((in+out   (pk 'pipe (pipe)))
>>                (lock     (make-mutex))
>>                (cond     (make-condition-variable))
>>                (signaled #f)
>>                (thread   (call-with-new-thread
>>                           (lambda ()
>>                             (with-mutex lock
>>                               (display "hello " (cdr in+out))
>>                               (wait-condition-variable cond lock)
>>                               (display "world\n" (cdr in+out))
>>                               (close-port (cdr in+out)))))))
>>           (define handle
>>             (lambda (signum)
>>               (with-mutex lock
>>                 (set! signaled (pk 'sig signum))
>>                 (signal-condition-variable cond))))
>>           (sigaction SIGALRM handle 0)
>>           (alarm 2)
>>
>>           ;; This thread (the main thread) receives the signal.  Yet,
>>           ;; the EINTR returned by read(2) as called via `read-line'
>>           ;; must be swallowed.
>>           (let ((line (read-line (car in+out))))
>>             (join-thread thread)
>>             (list signaled line)))
>>
>> This nicely reproduces the problem where fport_fill_input throws to
>> ‘system-error’ with EINTR.
>>
>> However, with a fixed SCM_SYSCALL, the result is pretty much the same as
>> with SA_RESTART (see <http://bugs.gnu.org/14640>): when SCM_ASYNC_TICK
>> is called right after we get EINTR, chances are that the async hasn’t
>> been queued yet, so we get back to our read(2) call, and thus the
>> Scheme-level signal handler is never called.  (Typically, when running
>> the test through strace, it passes, because the timing is “better”, but
>> it fails without strace.)
>>
>> Suggestions?
>
> Hmm.  Shouldn't our signal handlers be run in a different thread?  Maybe
> we can't make this change until 2.2, but it seems to me that there are
> very serious problems trying to run signal handlers from within asyncs,
> analogous to the problems running finalizers within asyncs.  Commonly,
> signal handlers need to mutate some global state, but that cannot in
> general be done safely from within asyncs, because asyncs might be
> called while the global state is in an inconsistent state, at least for
> data structures implemented in Scheme.
>
> What do you think?

I think the rationale was that signal handlers in Guile would be a
simplified version of what POSIX provides.  That is, they are called in
the thread that called ‘sigaction’, and there are no restrictions on
what procedures can be used from within the handler.  From that
perspective, I think it fits the bill.

Now, of course that introduces concurrency, but that’s what signals are
about anyway: asynchronous notifications.  Thus I don’t have any
particular problems with this implementation.

Am I overlooking other issues you had in mind?

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-07-05 20:01       ` SCM_SYSCALL Ludovic Courtès
@ 2013-07-06 16:41         ` Mark H Weaver
  2013-07-06 21:05           ` SCM_SYSCALL Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Mark H Weaver @ 2013-07-06 16:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi Ludovic,

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> Hmm.  Shouldn't our signal handlers be run in a different thread?  Maybe
>> we can't make this change until 2.2, but it seems to me that there are
>> very serious problems trying to run signal handlers from within asyncs,
>> analogous to the problems running finalizers within asyncs.  Commonly,
>> signal handlers need to mutate some global state, but that cannot in
>> general be done safely from within asyncs, because asyncs might be
>> called while the global state is in an inconsistent state, at least for
>> data structures implemented in Scheme.
>>
>> What do you think?
>
> I think the rationale was that signal handlers in Guile would be a
> simplified version of what POSIX provides.  That is, they are called in
> the thread that called ‘sigaction’, and there are no restrictions on
> what procedures can be used from within the handler.  From that
> perspective, I think it fits the bill.
>
> Now, of course that introduces concurrency, but that’s what signals are
> about anyway: asynchronous notifications.  Thus I don’t have any
> particular problems with this implementation.

I looked more carefully, and agree that our current API is fine.  It
makes it easy to handle signals in a different thread, if desired, or to
avoid the complications of multi-threaded programming and rely instead
of blocking asyncs.

So, back to the problem at hand:

> However, with a fixed SCM_SYSCALL, the result is pretty much the same as
> with SA_RESTART (see <http://bugs.gnu.org/14640>): when SCM_ASYNC_TICK
> is called right after we get EINTR, chances are that the async hasn’t
> been queued yet, so we get back to our read(2) call, and thus the
> Scheme-level signal handler is never called.  (Typically, when running
> the test through strace, it passes, because the timing is “better”, but
> it fails without strace.)

Right, so the problem is that, when Guile is built with thread support,
our signal delivery mechanism depends on the signal handling thread
executing, which adds an unpredictable amount of latency.

Initially I looked at how to fix the test case to work around this
problem, but really I think we need to fix the way that signals are
delivered.  If one chooses to deliver signals to a thread that's doing a
'read' (or other interruptible system call), then we ought to arrange
things so that the async is queued in time to be run before restarting
the call.

I think the best solution is to get rid of our internal signal handler
thread altogether, and instead arrange for signals to be delivered
directly to the thread that the user specified, by setting the thread
signal masks appropriately.  The C-level signal handler would then set
some global state that would be noticed by the SCM_SYSCALL loop.

In some ways, this would bring us closer to the non-thread signal
handling mechanism in scmsigs.c, which queued the asyncs directly from
the signal handler.  Unfortunately, that code is not safe.  For example,
if the non-thread 'take_signal' (the second one in scmsigs.c) is run
while 'scm_async_click' (async.c) is in between the following two lines:

      asyncs = t->active_asyncs;
      t->active_asyncs = SCM_EOL;

Then the signal will be lost.  Other problems can happen if the
non-threaded 'take_signal' interrupts itself (e.g. if two different
signals are delivered at nearly the same time).

So we'd need to devise a new mechanism that _is_ safe.
It is certainly doable.

If you're okay with this general approach, I'll look into it.

What do you think?

      Mark



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-07-06 16:41         ` SCM_SYSCALL Mark H Weaver
@ 2013-07-06 21:05           ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2013-07-06 21:05 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw@netris.org> skribis:
>>
>>> Hmm.  Shouldn't our signal handlers be run in a different thread?  Maybe
>>> we can't make this change until 2.2, but it seems to me that there are
>>> very serious problems trying to run signal handlers from within asyncs,
>>> analogous to the problems running finalizers within asyncs.  Commonly,
>>> signal handlers need to mutate some global state, but that cannot in
>>> general be done safely from within asyncs, because asyncs might be
>>> called while the global state is in an inconsistent state, at least for
>>> data structures implemented in Scheme.
>>>
>>> What do you think?

[...]

>> However, with a fixed SCM_SYSCALL, the result is pretty much the same as
>> with SA_RESTART (see <http://bugs.gnu.org/14640>): when SCM_ASYNC_TICK
>> is called right after we get EINTR, chances are that the async hasn’t
>> been queued yet, so we get back to our read(2) call, and thus the
>> Scheme-level signal handler is never called.  (Typically, when running
>> the test through strace, it passes, because the timing is “better”, but
>> it fails without strace.)
>
> Right, so the problem is that, when Guile is built with thread support,
> our signal delivery mechanism depends on the signal handling thread
> executing, which adds an unpredictable amount of latency.
>
> Initially I looked at how to fix the test case to work around this
> problem, but really I think we need to fix the way that signals are
> delivered.  If one chooses to deliver signals to a thread that's doing a
> 'read' (or other interruptible system call), then we ought to arrange
> things so that the async is queued in time to be run before restarting
> the call.
>
> I think the best solution is to get rid of our internal signal handler
> thread altogether, and instead arrange for signals to be delivered
> directly to the thread that the user specified, by setting the thread
> signal masks appropriately.  The C-level signal handler would then set
> some global state that would be noticed by the SCM_SYSCALL loop.

The C-level handler must restrict itself to async-signal-safe functions,
which excludes GC_malloc for instance (leading to hard-to-fix issues
like those you identified in the 2nd ‘take_signal’.)

Another issue is that, at the Scheme level, the signal is specified to
be delivered to the thread that called ‘sigaction’ or to whatever thread
was specified in the ‘sigaction’ call.  It’s unclear to me that this
could be achieved with just pthread_sigmask (which is missing on some
platforms, such as MinGW.)

> In some ways, this would bring us closer to the non-thread signal
> handling mechanism in scmsigs.c, which queued the asyncs directly from
> the signal handler.  Unfortunately, that code is not safe.  For example,
> if the non-thread 'take_signal' (the second one in scmsigs.c) is run
> while 'scm_async_click' (async.c) is in between the following two lines:
>
>       asyncs = t->active_asyncs;
>       t->active_asyncs = SCM_EOL;
>
> Then the signal will be lost.  Other problems can happen if the
> non-threaded 'take_signal' interrupts itself (e.g. if two different
> signals are delivered at nearly the same time).

Indeed.

> So we'd need to devise a new mechanism that _is_ safe.
> It is certainly doable.
>
> If you're okay with this general approach, I'll look into it.

Well, this is a can of worms, so we’d rather open it in 2.1, IMO.  :-)
It’s not clear to me that this can be improved without breaking
something else, but I’m all ears.

Back to the problem at hand: do you have any idea on how to write a test
case?  If not, I would just commit the SCM_SYSCALL fix.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-06-23 21:25 SCM_SYSCALL Ludovic Courtès
  2013-07-03 18:19 ` SCM_SYSCALL Mark H Weaver
@ 2013-07-17 16:04 ` Ludovic Courtès
  2014-03-23 19:56 ` SCM_SYSCALL Andy Wingo
  2 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2013-07-17 16:04 UTC (permalink / raw)
  To: guile-devel

ludo@gnu.org (Ludovic Courtès) skribis:

> We have this (since 2010):
>
> #   define SCM_SYSCALL(line)                    \
>   do                                            \
>     {                                           \
>       errno = 0;                                \
>       line;                                     \
>       if (errno == EINTR)                       \
>         {                                       \
>           SCM_ASYNC_TICK;                       \
>           continue;                             \
>         }                                       \
>     }                                           \
>   while(0)
>
> It turns out that the effect upon EINTR is to leave the loop.  So
> typically, fport_fill_input just throws to system-error and reveals the
> EINTR, contrary to SCM_SYSCALL intends to do.

Fixed in fe51c7b.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCM_SYSCALL
  2013-06-23 21:25 SCM_SYSCALL Ludovic Courtès
  2013-07-03 18:19 ` SCM_SYSCALL Mark H Weaver
  2013-07-17 16:04 ` SCM_SYSCALL Ludovic Courtès
@ 2014-03-23 19:56 ` Andy Wingo
  2 siblings, 0 replies; 9+ messages in thread
From: Andy Wingo @ 2014-03-23 19:56 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Sun 23 Jun 2013 23:25, ludo@gnu.org (Ludovic Courtès) writes:

> We have this (since 2010):
>
> #   define SCM_SYSCALL(line)                    \
>   do                                            \
>     {                                           \
>       errno = 0;                                \
>       line;                                     \
>       if (errno == EINTR)                       \
>         {                                       \
>           SCM_ASYNC_TICK;                       \
>           continue;                             \
>         }                                       \
>     }                                           \
>   while(0)
>
> It turns out that the effect upon EINTR is to leave the loop.

ZOMG.  So embarrassed.

Sheepishly,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-23 19:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-23 21:25 SCM_SYSCALL Ludovic Courtès
2013-07-03 18:19 ` SCM_SYSCALL Mark H Weaver
2013-07-04 22:28   ` SCM_SYSCALL Ludovic Courtès
2013-07-05 18:56     ` SCM_SYSCALL Mark H Weaver
2013-07-05 20:01       ` SCM_SYSCALL Ludovic Courtès
2013-07-06 16:41         ` SCM_SYSCALL Mark H Weaver
2013-07-06 21:05           ` SCM_SYSCALL Ludovic Courtès
2013-07-17 16:04 ` SCM_SYSCALL Ludovic Courtès
2014-03-23 19:56 ` SCM_SYSCALL Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).