From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Wingo Subject: bug#28211: Stack marking issue in multi-threaded code, 2020 edition Date: Tue, 17 Mar 2020 22:16:22 +0100 Message-ID: <87a74eznq1.fsf@pobox.com> References: <877exuj58y.fsf@gnu.org> <87d0yo1tie.fsf@gnu.org> <87fu3124nt.fsf@gnu.org> <87d0y5k6sl.fsf@netris.org> <871sel6vnq.fsf@igalia.com> <87fu30dmx3.fsf@netris.org> <87tvrg3q1d.fsf@igalia.com> <87a7rdvdm9.fsf_-_@gnu.org> <87tv2tp74g.fsf_-_@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:38780) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jEJaC-0007Ym-S6 for bug-guix@gnu.org; Tue, 17 Mar 2020 17:17:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jEJaA-00082S-HO for bug-guix@gnu.org; Tue, 17 Mar 2020 17:17:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:33638) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jEJaA-0007zo-Bu for bug-guix@gnu.org; Tue, 17 Mar 2020 17:17:02 -0400 In-Reply-To: <877exuj58y.fsf@gnu.org> Sender: "Debbugs-submit" Resent-Message-ID: List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane-mx.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 28211@debbugs.gnu.org On Thu 12 Mar 2020 22:59, Ludovic Court=C3=A8s writes: > I think I=E2=80=99ve found another race condition involving stack marking= , as a > followup to (this time on > 3.0.1+, but the code is almost the same.) > > =E2=80=98abort_to_prompt=E2=80=99 does this: > > fp =3D vp->stack_top - fp_offset; > sp =3D vp->stack_top - sp_offset; > > /* Continuation gets nargs+1 values: the one more is for the cont. */ > sp =3D sp - nargs - 1; > > /* Shuffle abort arguments down to the prompt continuation. We have > to be jumping to an older part of the stack. */ > if (sp < vp->sp) > abort (); > sp[nargs].as_scm =3D cont; > while (nargs--) > sp[nargs] =3D vp->sp[nargs]; > > /* Restore VM regs */ > vp->fp =3D fp; > vp->sp =3D sp; > vp->ip =3D vra; > > > What if =E2=80=98scm_i_vm_mark_stack=E2=80=99 walks the stack right befor= e the =E2=80=98vp->fp=E2=80=99 > assignment? It can determine that one of the just-assigned =E2=80=98sp[n= args]=E2=80=99 > is a dead slot, and thus set it to SCM_UNSPECIFIED. I think you're right here. Given that the most-recently-pushed frame is marked conservatively, I think it would be sufficient to reset vp->fp before shuffling stack args; that would make it so that the frame includes the values to shuffle, their target locations, and probably some other crap in between. Given that marking the crap is harmless, I think that would be enough. WDYT? In a more perfect world, initiating GC should tell threads to reach a safepoint and mark their own stacks -- preserves thread locality and prevents this class of bug. But given that libgc uses signals to stop threads, we have to be less precise. Cheers, Andy