unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Andy Wingo <wingo@igalia.com>
Cc: 28211@debbugs.gnu.org
Subject: bug#28211: Stack marking issue in multi-threaded code
Date: Fri, 29 Jun 2018 17:03:42 +0200	[thread overview]
Message-ID: <87a7rdvdm9.fsf_-_@gnu.org> (raw)
In-Reply-To: <87tvrg3q1d.fsf@igalia.com> (Andy Wingo's message of "Thu, 10 May 2018 09:53:18 +0200")

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

Hey hey, comrades!

I have a fix for some (most?) of the crashes we were seeing while
running multi-threaded code such as (guix build compile), and,
presumably, the grafting code mentioned at the beginning of this bug
report, although I haven’t checked yet.

So, ‘scm_i_vm_mark_stack’ marks the stack precisely, but contrary to
what I suspected, precise marking is not at fault.

Instead, the problem has to do with the fact that some VM instructions
change the frame pointer (vp->fp) before they have set up the dynamic
link for that new frame.

As a consequence, if a stop-the-world GC is triggered after vp->fp has
been changed and before its dynamic link has been set, the stack-walking
loop in ‘scm_i_vm_mark_stack’ could stop very early, leaving a lot of
objects unmarked.

The patch below fixes the problem for me.  \o/

I’m thinking we could perhaps add a compiler barrier before ‘vp->fp = new_fp’
statements, but in practice it’s not necessary here (x86_64, gcc 7).

Thoughts?

I’d like to push this real soon.  I’ll also do more testing on real
workloads from Guix, and then I’d like to release 2.2.4, hopefully
within a few days.

Thank you and thanks Andy for the discussions on IRC!

Ludo’, who’s going to party all night long.  :-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3257 bytes --]

diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
index 1aa4e9699..19ff3e498 100644
--- a/libguile/vm-engine.c
+++ b/libguile/vm-engine.c
@@ -548,7 +548,7 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
   VM_DEFINE_OP (1, call, "call", OP2 (X8_F24, X8_C24))
     {
       scm_t_uint32 proc, nlocals;
-      union scm_vm_stack_element *old_fp;
+      union scm_vm_stack_element *old_fp, *new_fp;
 
       UNPACK_24 (op, proc);
       UNPACK_24 (ip[1], nlocals);
@@ -556,9 +556,10 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
       PUSH_CONTINUATION_HOOK ();
 
       old_fp = vp->fp;
-      vp->fp = SCM_FRAME_SLOT (old_fp, proc - 1);
-      SCM_FRAME_SET_DYNAMIC_LINK (vp->fp, old_fp);
-      SCM_FRAME_SET_RETURN_ADDRESS (vp->fp, ip + 2);
+      new_fp = SCM_FRAME_SLOT (old_fp, proc - 1);
+      SCM_FRAME_SET_DYNAMIC_LINK (new_fp, old_fp);
+      SCM_FRAME_SET_RETURN_ADDRESS (new_fp, ip + 2);
+      vp->fp = new_fp;
 
       RESET_FRAME (nlocals);
 
@@ -586,7 +587,7 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
     {
       scm_t_uint32 proc, nlocals;
       scm_t_int32 label;
-      union scm_vm_stack_element *old_fp;
+      union scm_vm_stack_element *old_fp, *new_fp;
 
       UNPACK_24 (op, proc);
       UNPACK_24 (ip[1], nlocals);
@@ -595,9 +596,10 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
       PUSH_CONTINUATION_HOOK ();
 
       old_fp = vp->fp;
-      vp->fp = SCM_FRAME_SLOT (old_fp, proc - 1);
-      SCM_FRAME_SET_DYNAMIC_LINK (vp->fp, old_fp);
-      SCM_FRAME_SET_RETURN_ADDRESS (vp->fp, ip + 3);
+      new_fp = SCM_FRAME_SLOT (old_fp, proc - 1);
+      SCM_FRAME_SET_DYNAMIC_LINK (new_fp, old_fp);
+      SCM_FRAME_SET_RETURN_ADDRESS (new_fp, ip + 3);
+      vp->fp = new_fp;
 
       RESET_FRAME (nlocals);
 
@@ -3893,7 +3895,7 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
         NEXT (1);
 
       {
-        union scm_vm_stack_element *old_fp;
+        union scm_vm_stack_element *old_fp, *new_fp;
         size_t old_frame_size = FRAME_LOCALS_COUNT ();
         SCM proc = scm_i_async_pop (thread);
 
@@ -3907,9 +3909,10 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
            handle-interrupts opcode to handle any additional
            interrupts.  */
         old_fp = vp->fp;
-        vp->fp = SCM_FRAME_SLOT (old_fp, old_frame_size + 1);
-        SCM_FRAME_SET_DYNAMIC_LINK (vp->fp, old_fp);
-        SCM_FRAME_SET_RETURN_ADDRESS (vp->fp, ip);
+        new_fp = SCM_FRAME_SLOT (old_fp, old_frame_size + 1);
+        SCM_FRAME_SET_DYNAMIC_LINK (new_fp, old_fp);
+        SCM_FRAME_SET_RETURN_ADDRESS (new_fp, ip);
+        vp->fp = new_fp;
 
         SP_SET (0, proc);
 
diff --git a/libguile/vm.c b/libguile/vm.c
index c8ec6e1b2..7749159e5 100644
--- a/libguile/vm.c
+++ b/libguile/vm.c
@@ -1011,6 +1011,18 @@ scm_i_vm_mark_stack (struct scm_vm *vp, struct GC_ms_entry *mark_stack_ptr,
       slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp), &cache);
     }
 
+  size_t extra = 0;
+  for (; sp < vp->stack_top; sp++)
+    {
+      if (GC_is_heap_ptr (sp->as_ptr))
+        extra++;
+    }
+  if (extra)
+    {
+      printf ("%s extra: %zi\n", __func__, extra);
+      abort ();
+    }
+
   return_unused_stack_to_os (vp);
 
   return mark_stack_ptr;

  reply	other threads:[~2018-06-29 15:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23 22:20 bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Ludovic Courtès
2017-08-23 22:48 ` Ludovic Courtès
2018-04-24 16:03 ` Ludovic Courtès
2018-05-08 21:55   ` Ludovic Courtès
2018-05-09  0:32     ` Mark H Weaver
2018-05-09  7:17       ` Ludovic Courtès
2018-05-09  9:11       ` Andy Wingo
2018-05-10  6:50         ` Mark H Weaver
2018-05-10  7:53           ` Andy Wingo
2018-06-29 15:03             ` Ludovic Courtès [this message]
2018-06-29 16:54               ` bug#28211: Stack marking issue in multi-threaded code Mark H Weaver
2018-06-29 21:18                 ` Ludovic Courtès
2018-06-29 23:18                   ` Mark H Weaver
2018-06-30 20:53                     ` Ludovic Courtès
2018-06-30 21:49                       ` Mark H Weaver
2018-07-01 10:12                         ` Andy Wingo
2018-07-03 19:01                           ` Mark H Weaver
2020-03-12 21:59               ` bug#28211: Stack marking issue in multi-threaded code, 2020 edition Ludovic Courtès
2020-03-13 22:38                 ` Ludovic Courtès
2020-03-17 21:16                 ` Andy Wingo
2018-05-10 15:48     ` bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Mark H Weaver
2018-05-10 16:01       ` Mark H Weaver
2018-07-02 10:28 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a7rdvdm9.fsf_-_@gnu.org \
    --to=ludo@gnu.org \
    --cc=28211@debbugs.gnu.org \
    --cc=wingo@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).