From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: bug#28211: Stack marking issue in multi-threaded code Date: Sat, 30 Jun 2018 17:49:03 -0400 Message-ID: <87tvpkrlm8.fsf@netris.org> References: <877exuj58y.fsf@gnu.org> <87d0yo1tie.fsf@gnu.org> <87fu3124nt.fsf@gnu.org> <87d0y5k6sl.fsf@netris.org> <871sel6vnq.fsf@igalia.com> <87fu30dmx3.fsf@netris.org> <87tvrg3q1d.fsf@igalia.com> <87a7rdvdm9.fsf_-_@gnu.org> <87in61y1m0.fsf@netris.org> <87muvdthp4.fsf@gnu.org> <87efgpw5ao.fsf@netris.org> <87h8lkro7c.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:35320) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fZNlp-0001p2-FC for bug-guix@gnu.org; Sat, 30 Jun 2018 17:51:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fZNlm-0002EU-B6 for bug-guix@gnu.org; Sat, 30 Jun 2018 17:51:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:33601) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fZNlm-0002E4-7R for bug-guix@gnu.org; Sat, 30 Jun 2018 17:51:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87h8lkro7c.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Sat, 30 Jun 2018 22:53:11 +0200") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: Andy Wingo , 28211@debbugs.gnu.org Hi Ludovic, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Mark H Weaver skribis: > >> ludo@gnu.org (Ludovic Court=C3=A8s) writes: > > [...] > >>>> Indeed, we need something to ensure that the compiler won't reorder >>>> these writes. My recommendation would be to use the 'volatile' >>>> qualifier in the declarations of both the 'fp' field of 'struct scm_vm' >>>> and the stack element modified by SCM_FRAME_SET_DYNAMIC_LINK. >>>> >>>> Maybe something like this: >>>> >>>> diff --git a/libguile/frames.h b/libguile/frames.h >>>> index ef2db3df5..8554f886b 100644 >>>> --- a/libguile/frames.h >>>> +++ b/libguile/frames.h >>>> @@ -89,6 +89,7 @@ >>>> union scm_vm_stack_element >>>> { >>>> scm_t_uintptr as_uint; >>>> + volatile scm_t_uintptr as_volatile_uint; >>> >>> [...] >>> >>>> +#define SCM_FRAME_SET_DYNAMIC_LINK(fp, dl) ((fp)[1].as_volatile_uint = =3D ((dl) - (fp))) >>> >>> I=E2=80=99m not sure this is exactly what we need. >>> >>> What I had in mind, to make sure the vp->fp assignment really happens >>> after the SCM_FRAME_SET_DYNAMIC_LINK, was to do something like this: >>> >>> SCM_FRAME_SET_RETURN_ADDRESS (new_fp, =E2=80=A6); >>> SCM_FRAME_SET_DYNAMIC_LINK (new_fp, =E2=80=A6); >>> >>> asm volatile ("" : : : "memory"); >>> >>> vp->fp =3D new_fp; >>> >>> I think that more accurately reflects what we want, WDYT? >>> >>> It=E2=80=99s GCC-specific though (I don=E2=80=99t think Clang implement= s =E2=80=98asm=E2=80=99 in a >>> compatible way, does it?) and I suppose we=E2=80=99d have to ignore the= non-GCC >>> case. >> >> Right, the problem is that the "asm volatile" solution is not portable. >> >> To my mind, it's fine to optionally use GCC extensions for improved >> performance, but I think we should ensure that Guile works reliably when >> compiled with any conforming C compiler. > > I agree, of course (that said, most compilers implement most GCC > extensions nowadays, but =E2=80=98asm=E2=80=99 is probably an exception). > >> What problem do you see with my proposed portable solution? If I >> understand C99 section 5.1.2.3 paragraph 2 correctly, compilers are not >> permitted to reorder accesses to volatile objects as long as there is a >> sequence point between them. > > My understand is that what you propose is (almost*) equivalent to the > asm trick, provided SCM_FRAME_SET_DYNAMIC_LINK is the last statement > before the vp->fp assignment. So we=E2=80=99d have to make sure we keep = things > in this order, right? I don't think it matters. This would only ensure the relative ordering of the SCM_FRAME_SET_DYNAMIC_LINK and the vp->fp assignment. IIUC, the non-volatile memory accesses could be reordered relative to either of them. The "asm volatile" memory barrier would be stronger, guaranteeing that _all_ accesses before the barrier would be performed before _all_ accessed after the barrier, within that thread. > [*] The =E2=80=98volatile=E2=80=99 qualifier on the field does more than = just an > instruction reordering barrier: AIUI, it forces the compiler to emit > a load and store instruction right at the assignment point, which is > a stricter constraint than what we need, I think. I'm not sure what "right at the assignment point" means exactly, given that the compiler is free to reorder things quite a bit. In addition to the C99 standard which I cited in my previous email, you might also find the following link informative: https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html In particular: [...] The minimum requirement is that at a sequence point all previous accesses to volatile objects have stabilized and no subsequent accesses have occurred. Thus an implementation is free to reorder and combine volatile accesses that occur between sequence points, but cannot do so for accesses across a sequence point. The use of volatile does not allow you to violate the restriction on updating objects multiple times between two sequence points. =20=20 Accesses to non-volatile objects are not ordered with respect to volatile accesses. You cannot use a volatile object as a memory barrier to order a sequence of writes to non-volatile memory. For instance: =20=20 int *ptr =3D something; volatile int vobj; *ptr =3D something; vobj =3D 1; =20=20 Unless *ptr and vobj can be aliased, it is not guaranteed that the write to *ptr occurs by the time the update of vobj happens. If you need this guarantee, you must use a stronger memory barrier [...] >> I should say that I'm not confident that _either_ of these proposed >> solutions will adequately address all of the possible problems that >> could occur when GC is performed on VM threads stopped at arbitrary >> points in their execution. > > Yeah, as discussed on IRC with Andy, we=E2=80=99d be better off if we wer= e sure > that each stack is marked by the thread it belongs to, in terms of data > locality, and thus also in terms of being sure that vp->fp is up-to-date > when the marker reads it. It=E2=80=99s not something we can change now, = though. I'm not sure it matters what thread the marking is done in, because when the actual collection happens, all threads are first stopped in their signal handlers, and presumably the appropriate memory barriers are performed so that all threads are synchronized before the full collection. Now, I'm aware that libgc does incremental marking as well, but I'm optimistically hoping that they've done their homework to ensure that if anything might have changed between the incremental marking and the full collection, that the relevant objects will be marked again before the full collection. However, I admit that I don't know enough of libgc internals to know if this is really the case. What do you think? > Alternately, we could use atomics when accessing vp->fp to ensure memory > consistency (I tried that during my debugging journey=E2=80=A6). It coul= d be > costly. This would be far too costly to consider, in my opinion. > Anyway, I don=E2=80=99t think we=E2=80=99ll have the final word on all th= is before > 2.2.4. The way I see it we should keep working on improving it, but > there are difficult choices to make, so it will probably take a bit of > time. Sounds good. Thanks, Mark