From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: SCM_SYSCALL Date: Sat, 06 Jul 2013 12:41:31 -0400 Message-ID: <87fvvrabes.fsf@tines.lan> References: <87li607c5l.fsf@gnu.org> <878v1nfqvn.fsf@tines.lan> <87zju27yeq.fsf@inria.fr> <878v1kbzuf.fsf@tines.lan> <87d2qwu66r.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1373128939 11985 80.91.229.3 (6 Jul 2013 16:42:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 6 Jul 2013 16:42:19 +0000 (UTC) Cc: guile-devel@gnu.org To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Jul 06 18:42:20 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UvVYx-0001Vu-TN for guile-devel@m.gmane.org; Sat, 06 Jul 2013 18:42:20 +0200 Original-Received: from localhost ([::1]:51734 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UvVYx-0001B6-7X for guile-devel@m.gmane.org; Sat, 06 Jul 2013 12:42:19 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33345) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UvVYr-0001Ay-Va for guile-devel@gnu.org; Sat, 06 Jul 2013 12:42:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UvVYq-0003gp-JW for guile-devel@gnu.org; Sat, 06 Jul 2013 12:42:13 -0400 Original-Received: from world.peace.net ([96.39.62.75]:43261) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UvVYq-0003fH-Eu; Sat, 06 Jul 2013 12:42:12 -0400 Original-Received: from 209-6-120-240.c3-0.arl-ubr1.sbo-arl.ma.cable.rcn.com ([209.6.120.240] helo=tines.lan) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1UvVYV-0002Ri-1c; Sat, 06 Jul 2013 12:41:51 -0400 In-Reply-To: <87d2qwu66r.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Fri, 05 Jul 2013 22:01:48 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:16517 Archived-At: Hi Ludovic, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Mark H Weaver skribis: > >> Hmm. Shouldn't our signal handlers be run in a different thread? Maybe >> we can't make this change until 2.2, but it seems to me that there are >> very serious problems trying to run signal handlers from within asyncs, >> analogous to the problems running finalizers within asyncs. Commonly, >> signal handlers need to mutate some global state, but that cannot in >> general be done safely from within asyncs, because asyncs might be >> called while the global state is in an inconsistent state, at least for >> data structures implemented in Scheme. >> >> What do you think? > > I think the rationale was that signal handlers in Guile would be a > simplified version of what POSIX provides. That is, they are called in > the thread that called =E2=80=98sigaction=E2=80=99, and there are no rest= rictions on > what procedures can be used from within the handler. From that > perspective, I think it fits the bill. > > Now, of course that introduces concurrency, but that=E2=80=99s what signa= ls are > about anyway: asynchronous notifications. Thus I don=E2=80=99t have any > particular problems with this implementation. I looked more carefully, and agree that our current API is fine. It makes it easy to handle signals in a different thread, if desired, or to avoid the complications of multi-threaded programming and rely instead of blocking asyncs. So, back to the problem at hand: > However, with a fixed SCM_SYSCALL, the result is pretty much the same as > with SA_RESTART (see ): when SCM_ASYNC_TICK > is called right after we get EINTR, chances are that the async hasn=E2=80= =99t > been queued yet, so we get back to our read(2) call, and thus the > Scheme-level signal handler is never called. (Typically, when running > the test through strace, it passes, because the timing is =E2=80=9Cbetter= =E2=80=9D, but > it fails without strace.) Right, so the problem is that, when Guile is built with thread support, our signal delivery mechanism depends on the signal handling thread executing, which adds an unpredictable amount of latency. Initially I looked at how to fix the test case to work around this problem, but really I think we need to fix the way that signals are delivered. If one chooses to deliver signals to a thread that's doing a 'read' (or other interruptible system call), then we ought to arrange things so that the async is queued in time to be run before restarting the call. I think the best solution is to get rid of our internal signal handler thread altogether, and instead arrange for signals to be delivered directly to the thread that the user specified, by setting the thread signal masks appropriately. The C-level signal handler would then set some global state that would be noticed by the SCM_SYSCALL loop. In some ways, this would bring us closer to the non-thread signal handling mechanism in scmsigs.c, which queued the asyncs directly from the signal handler. Unfortunately, that code is not safe. For example, if the non-thread 'take_signal' (the second one in scmsigs.c) is run while 'scm_async_click' (async.c) is in between the following two lines: asyncs =3D t->active_asyncs; t->active_asyncs =3D SCM_EOL; Then the signal will be lost. Other problems can happen if the non-threaded 'take_signal' interrupts itself (e.g. if two different signals are delivered at nearly the same time). So we'd need to devise a new mechanism that _is_ safe. It is certainly doable. If you're okay with this general approach, I'll look into it. What do you think? Mark