From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Newsgroups: gmane.lisp.guile.bugs Subject: bug#28211: Stack marking issue in multi-threaded code, 2020 edition Date: Wed, 18 Mar 2020 00:22:37 +0100 Message-ID: <87mu8ebm82.fsf@gnu.org> References: <877exuj58y.fsf@gnu.org> <87d0yo1tie.fsf@gnu.org> <87fu3124nt.fsf@gnu.org> <87d0y5k6sl.fsf@netris.org> <871sel6vnq.fsf@igalia.com> <87fu30dmx3.fsf@netris.org> <87tvrg3q1d.fsf@igalia.com> <87a7rdvdm9.fsf_-_@gnu.org> <87tv2tp74g.fsf_-_@gnu.org> <87a74eznq1.fsf@pobox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="28787"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) Cc: 28211@debbugs.gnu.org To: Andy Wingo Original-X-From: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Wed Mar 18 00:23:44 2020 Return-path: Envelope-to: guile-bugs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jELYm-0007Ou-FT for guile-bugs@m.gmane-mx.org; Wed, 18 Mar 2020 00:23:44 +0100 Original-Received: from localhost ([::1]:42792 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jELYl-0007NP-GD for guile-bugs@m.gmane-mx.org; Tue, 17 Mar 2020 19:23:43 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51297) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jELY8-0006T7-WC for bug-guile@gnu.org; Tue, 17 Mar 2020 19:23:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jELY6-0000eL-Nb for bug-guile@gnu.org; Tue, 17 Mar 2020 19:23:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:33691) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jELY6-0000c2-1r for bug-guile@gnu.org; Tue, 17 Mar 2020 19:23:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jELY5-0005Q5-L7; Tue, 17 Mar 2020 19:23:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 17 Mar 2020 23:23:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 28211 X-GNU-PR-Package: guile Original-Received: via spool by 28211-submit@debbugs.gnu.org id=B28211.158448736920813 (code B ref 28211); Tue, 17 Mar 2020 23:23:01 +0000 Original-Received: (at 28211) by debbugs.gnu.org; 17 Mar 2020 23:22:49 +0000 Original-Received: from localhost ([127.0.0.1]:39664 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jELXt-0005Pd-9i for submit@debbugs.gnu.org; Tue, 17 Mar 2020 19:22:49 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:35763) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jELXr-0005PP-4z for 28211@debbugs.gnu.org; Tue, 17 Mar 2020 19:22:47 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:43727) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jELXk-0007N5-Iq; Tue, 17 Mar 2020 19:22:40 -0400 Original-Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=58382 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jELXk-0001cI-43; Tue, 17 Mar 2020 19:22:40 -0400 In-Reply-To: <87a74eznq1.fsf@pobox.com> (Andy Wingo's message of "Tue, 17 Mar 2020 22:16:22 +0100") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.io gmane.lisp.guile.bugs:9657 Archived-At: Hi! Andy Wingo skribis: > On Thu 12 Mar 2020 22:59, Ludovic Court=C3=A8s writes: > >> I think I=E2=80=99ve found another race condition involving stack markin= g, as a >> followup to (this time on >> 3.0.1+, but the code is almost the same.) >> >> =E2=80=98abort_to_prompt=E2=80=99 does this: >> >> fp =3D vp->stack_top - fp_offset; >> sp =3D vp->stack_top - sp_offset; >> >> /* Continuation gets nargs+1 values: the one more is for the cont. */ >> sp =3D sp - nargs - 1; >> >> /* Shuffle abort arguments down to the prompt continuation. We have >> to be jumping to an older part of the stack. */ >> if (sp < vp->sp) >> abort (); >> sp[nargs].as_scm =3D cont; >> while (nargs--) >> sp[nargs] =3D vp->sp[nargs]; >> >> /* Restore VM regs */ >> vp->fp =3D fp; >> vp->sp =3D sp; >> vp->ip =3D vra; >> >> >> What if =E2=80=98scm_i_vm_mark_stack=E2=80=99 walks the stack right befo= re the =E2=80=98vp->fp=E2=80=99 >> assignment? It can determine that one of the just-assigned =E2=80=98sp[= nargs]=E2=80=99 >> is a dead slot, and thus set it to SCM_UNSPECIFIED. > > I think you're right here. > > Given that the most-recently-pushed frame is marked conservatively, I > think it would be sufficient to reset vp->fp before shuffling stack > args; that would make it so that the frame includes the values to > shuffle, their target locations, and probably some other crap in > between. Given that marking the crap is harmless, I think that would be > enough. WDYT? Sounds good. Following our discussion on IRC, I pushed what you proposed as 89edd1bc2dcff50fb05c3598a846d6b51b172f7c. \o/ I confirmed with and without rr that it no longer triggers the dreaded crash. BTW, pro tip: to run ./meta/guile under rr, I do: sed -i libguile/guile \ -e 's/exec /exec rr record -n --syscall-buffer-sig=3DSIGUSR1 /g' where =E2=80=98-n=E2=80=99 disables stack switching. > In a more perfect world, initiating GC should tell threads to reach a > safepoint and mark their own stacks -- preserves thread locality and > prevents this class of bug. But given that libgc uses signals to stop > threads, we have to be less precise. Yup, agreed. Thanks, Ludo=E2=80=99.