From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: bug#28211: Stack marking issue in multi-threaded code Date: Fri, 29 Jun 2018 12:54:47 -0400 Message-ID: <87in61y1m0.fsf@netris.org> References: <877exuj58y.fsf@gnu.org> <87d0yo1tie.fsf@gnu.org> <87fu3124nt.fsf@gnu.org> <87d0y5k6sl.fsf@netris.org> <871sel6vnq.fsf@igalia.com> <87fu30dmx3.fsf@netris.org> <87tvrg3q1d.fsf@igalia.com> <87a7rdvdm9.fsf_-_@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:37455) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fYwhl-0005DS-Ic for bug-guix@gnu.org; Fri, 29 Jun 2018 12:57:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fYwhi-0005GX-Fi for bug-guix@gnu.org; Fri, 29 Jun 2018 12:57:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:60373) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fYwhi-0005Ff-BB for bug-guix@gnu.org; Fri, 29 Jun 2018 12:57:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87a7rdvdm9.fsf_-_@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\?\= \=\?utf-8\?Q\?\=22's\?\= message of "Fri, 29 Jun 2018 17:03:42 +0200") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: Andy Wingo , 28211@debbugs.gnu.org Hi Ludovic, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > I have a fix for some (most?) of the crashes we were seeing while > running multi-threaded code such as (guix build compile), and, > presumably, the grafting code mentioned at the beginning of this bug > report, although I haven=E2=80=99t checked yet. > > So, =E2=80=98scm_i_vm_mark_stack=E2=80=99 marks the stack precisely, but = contrary to > what I suspected, precise marking is not at fault. > > Instead, the problem has to do with the fact that some VM instructions > change the frame pointer (vp->fp) before they have set up the dynamic > link for that new frame. > > As a consequence, if a stop-the-world GC is triggered after vp->fp has > been changed and before its dynamic link has been set, the stack-walking > loop in =E2=80=98scm_i_vm_mark_stack=E2=80=99 could stop very early, leav= ing a lot of > objects unmarked. > > The patch below fixes the problem for me. \o/ That's great news! Thanks for investigating. > I=E2=80=99m thinking we could perhaps add a compiler barrier before =E2= =80=98vp->fp =3D new_fp=E2=80=99 > statements, but in practice it=E2=80=99s not necessary here (x86_64, gcc = 7). > > Thoughts? Indeed, we need something to ensure that the compiler won't reorder these writes. My recommendation would be to use the 'volatile' qualifier in the declarations of both the 'fp' field of 'struct scm_vm' and the stack element modified by SCM_FRAME_SET_DYNAMIC_LINK. Maybe something like this: --8<---------------cut here---------------start------------->8--- diff --git a/libguile/frames.h b/libguile/frames.h index ef2db3df5..8554f886b 100644 --- a/libguile/frames.h +++ b/libguile/frames.h @@ -89,6 +89,7 @@ union scm_vm_stack_element { scm_t_uintptr as_uint; + volatile scm_t_uintptr as_volatile_uint; scm_t_uint32 *as_ip; SCM as_scm; double as_f64; @@ -104,7 +105,7 @@ union scm_vm_stack_element #define SCM_FRAME_RETURN_ADDRESS(fp) ((fp)[0].as_ip) #define SCM_FRAME_SET_RETURN_ADDRESS(fp, ra) ((fp)[0].as_ip =3D (ra)) #define SCM_FRAME_DYNAMIC_LINK(fp) ((fp) + (fp)[1].as_uint) -#define SCM_FRAME_SET_DYNAMIC_LINK(fp, dl) ((fp)[1].as_uint =3D ((dl) - (f= p))) +#define SCM_FRAME_SET_DYNAMIC_LINK(fp, dl) ((fp)[1].as_volatile_uint =3D (= (dl) - (fp))) #define SCM_FRAME_SLOT(fp,i) ((fp) - (i) - 1) #define SCM_FRAME_LOCAL(fp,i) (SCM_FRAME_SLOT (fp, i)->as_scm) #define SCM_FRAME_NUM_LOCALS(fp, sp) ((fp) - (sp)) diff --git a/libguile/vm.h b/libguile/vm.h index a1cac391f..0a6179c19 100644 --- a/libguile/vm.h +++ b/libguile/vm.h @@ -38,7 +38,7 @@ enum { struct scm_vm { scm_t_uint32 *ip; /* instruction pointer */ union scm_vm_stack_element *sp; /* stack pointer */ - union scm_vm_stack_element *fp; /* frame pointer */ + union scm_vm_stack_element *volatile fp; /* frame pointer */ union scm_vm_stack_element *stack_limit; /* stack limit address */ int trace_level; /* traces enabled if trace_level > 0 */ union scm_vm_stack_element *sp_min_since_gc; /* deepest sp since last gc= */ --8<---------------cut here---------------end--------------->8--- > I=E2=80=99d like to push this real soon. I=E2=80=99ll also do more testi= ng on real > workloads from Guix, and then I=E2=80=99d like to release 2.2.4, hopefully > within a few days. Sounds good to me. Thanks, Mark