From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Re: SCM_SYSCALL Date: Sat, 06 Jul 2013 23:05:43 +0200 Message-ID: <87li5jo0uw.fsf@gnu.org> References: <87li607c5l.fsf@gnu.org> <878v1nfqvn.fsf@tines.lan> <87zju27yeq.fsf@inria.fr> <878v1kbzuf.fsf@tines.lan> <87d2qwu66r.fsf@gnu.org> <87fvvrabes.fsf@tines.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1373145058 23525 80.91.229.3 (6 Jul 2013 21:10:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 6 Jul 2013 21:10:58 +0000 (UTC) Cc: guile-devel@gnu.org To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Jul 06 23:11:00 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UvZkw-0001CT-Hq for guile-devel@m.gmane.org; Sat, 06 Jul 2013 23:10:58 +0200 Original-Received: from localhost ([::1]:59220 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UvZkv-0008FU-S1 for guile-devel@m.gmane.org; Sat, 06 Jul 2013 17:10:57 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46852) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UvZkp-0008F7-Su for guile-devel@gnu.org; Sat, 06 Jul 2013 17:10:53 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UvZko-0003nH-Bm for guile-devel@gnu.org; Sat, 06 Jul 2013 17:10:51 -0400 Original-Received: from hera.aquilenet.fr ([141.255.128.1]:53622) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UvZko-0003ln-1k for guile-devel@gnu.org; Sat, 06 Jul 2013 17:10:50 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 0F8ACB37; Sat, 6 Jul 2013 23:05:45 +0200 (CEST) Original-Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z4QinnzkmYy2; Sat, 6 Jul 2013 23:05:44 +0200 (CEST) Original-Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 55FB025B; Sat, 6 Jul 2013 23:05:44 +0200 (CEST) X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 18 Messidor an 221 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 83C4 F8E5 10A3 3B4C 5BEA D15D 77DD 95E2 EA52 ECF4 X-OS: x86_64-unknown-linux-gnu In-Reply-To: <87fvvrabes.fsf@tines.lan> (Mark H. Weaver's message of "Sat, 06 Jul 2013 12:41:31 -0400") User-Agent: Gnus/5.130007 (Ma Gnus v0.7) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x [fuzzy] X-Received-From: 141.255.128.1 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:16518 Archived-At: Mark H Weaver skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> Mark H Weaver skribis: >> >>> Hmm. Shouldn't our signal handlers be run in a different thread? Maybe >>> we can't make this change until 2.2, but it seems to me that there are >>> very serious problems trying to run signal handlers from within asyncs, >>> analogous to the problems running finalizers within asyncs. Commonly, >>> signal handlers need to mutate some global state, but that cannot in >>> general be done safely from within asyncs, because asyncs might be >>> called while the global state is in an inconsistent state, at least for >>> data structures implemented in Scheme. >>> >>> What do you think? [...] >> However, with a fixed SCM_SYSCALL, the result is pretty much the same as >> with SA_RESTART (see ): when SCM_ASYNC_TICK >> is called right after we get EINTR, chances are that the async hasn=E2= =80=99t >> been queued yet, so we get back to our read(2) call, and thus the >> Scheme-level signal handler is never called. (Typically, when running >> the test through strace, it passes, because the timing is =E2=80=9Cbette= r=E2=80=9D, but >> it fails without strace.) > > Right, so the problem is that, when Guile is built with thread support, > our signal delivery mechanism depends on the signal handling thread > executing, which adds an unpredictable amount of latency. > > Initially I looked at how to fix the test case to work around this > problem, but really I think we need to fix the way that signals are > delivered. If one chooses to deliver signals to a thread that's doing a > 'read' (or other interruptible system call), then we ought to arrange > things so that the async is queued in time to be run before restarting > the call. > > I think the best solution is to get rid of our internal signal handler > thread altogether, and instead arrange for signals to be delivered > directly to the thread that the user specified, by setting the thread > signal masks appropriately. The C-level signal handler would then set > some global state that would be noticed by the SCM_SYSCALL loop. The C-level handler must restrict itself to async-signal-safe functions, which excludes GC_malloc for instance (leading to hard-to-fix issues like those you identified in the 2nd =E2=80=98take_signal=E2=80=99.) Another issue is that, at the Scheme level, the signal is specified to be delivered to the thread that called =E2=80=98sigaction=E2=80=99 or to wh= atever thread was specified in the =E2=80=98sigaction=E2=80=99 call. It=E2=80=99s unclea= r to me that this could be achieved with just pthread_sigmask (which is missing on some platforms, such as MinGW.) > In some ways, this would bring us closer to the non-thread signal > handling mechanism in scmsigs.c, which queued the asyncs directly from > the signal handler. Unfortunately, that code is not safe. For example, > if the non-thread 'take_signal' (the second one in scmsigs.c) is run > while 'scm_async_click' (async.c) is in between the following two lines: > > asyncs =3D t->active_asyncs; > t->active_asyncs =3D SCM_EOL; > > Then the signal will be lost. Other problems can happen if the > non-threaded 'take_signal' interrupts itself (e.g. if two different > signals are delivered at nearly the same time). Indeed. > So we'd need to devise a new mechanism that _is_ safe. > It is certainly doable. > > If you're okay with this general approach, I'll look into it. Well, this is a can of worms, so we=E2=80=99d rather open it in 2.1, IMO. = :-) It=E2=80=99s not clear to me that this can be improved without breaking something else, but I=E2=80=99m all ears. Back to the problem at hand: do you have any idea on how to write a test case? If not, I would just commit the SCM_SYSCALL fix. Thanks, Ludo=E2=80=99.