From: "Ludovic Courtès" <ludo@gnu.org>
To: Andy Wingo <wingo@igalia.com>
Cc: 28211@debbugs.gnu.org
Subject: bug#28211: Stack marking issue in multi-threaded code, 2020 edition
Date: Thu, 12 Mar 2020 22:59:11 +0100 [thread overview]
Message-ID: <87tv2tp74g.fsf_-_@gnu.org> (raw)
In-Reply-To: <87a7rdvdm9.fsf_-_@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\?\= \=\?utf-8\?Q\?\=22's\?\= message of "Fri, 29 Jun 2018 17:03:42 +0200")
Hi!
I think I’ve found another race condition involving stack marking, as a
followup to <https://issues.guix.gnu.org/issue/28211> (this time on
3.0.1+, but the code is almost the same.)
‘abort_to_prompt’ does this:
--8<---------------cut here---------------start------------->8---
fp = vp->stack_top - fp_offset;
sp = vp->stack_top - sp_offset;
/* Continuation gets nargs+1 values: the one more is for the cont. */
sp = sp - nargs - 1;
/* Shuffle abort arguments down to the prompt continuation. We have
to be jumping to an older part of the stack. */
if (sp < vp->sp)
abort ();
sp[nargs].as_scm = cont;
while (nargs--)
sp[nargs] = vp->sp[nargs];
/* Restore VM regs */
vp->fp = fp;
vp->sp = sp;
vp->ip = vra;
--8<---------------cut here---------------end--------------->8---
What if ‘scm_i_vm_mark_stack’ walks the stack right before the ‘vp->fp’
assignment? It can determine that one of the just-assigned ‘sp[nargs]’
is a dead slot, and thus set it to SCM_UNSPECIFIED. Later, when we set
‘vp->fp’, that stack slot that we just initialized has been overwritten
by ‘scm_i_vm_mark_stack’. Down the road, we get something like:
Wrong type to apply: #<unspecified>
I believe this is what I’m seeing here (0x7ff7f838dda0 is being set to
SCM_UNSPECIFIED while thread 2 is in ‘abort_to_prompt’):
--8<---------------cut here---------------start------------->8---
(rr) thread 5
[Switching to thread 5 (Thread 24572.24575)]
#0 scm_i_vm_mark_stack (vp=0x7ff7fd820b48, mark_stack_ptr=0x7ff7fc0ebf90,
mark_stack_limit=0x7ff7fc0fbec0) at vm.c:743
743 break;
(rr) list
738 break;
739 case SLOT_DESC_DEAD:
740 /* This value may become dead as a result of GC,
741 so we can't just leave it on the stack. */
742 sp->as_scm = SCM_UNSPECIFIED;
743 break;
744 }
745 }
746 sp = SCM_FRAME_PREVIOUS_SP (fp);
747 /* Inner frames may have a dead slots map for precise marking.
(rr) p sp->as_scm
$59 = #<unspecified>
(rr) p sp
$60 = (union scm_vm_stack_element *) 0x7ff7f838dda0
(rr) thread 2
[Switching to thread 2 (Thread 24572.24577)]
#0 0x00007ff7fdb7bb36 in __GI___sigsuspend (
set=set@entry=0x7ff7fe132720 <suspend_handler_mask>)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
26 ../sysdeps/unix/sysv/linux/sigsuspend.c: Dosiero aŭ dosierujo ne ekzistas.
(rr) frame 4
#4 0x00007ff7fe228f14 in abort_to_prompt (thread=0x7ff7fd820b40,
saved_mra=<optimized out>) at vm.c:1465
1465 sp[nargs] = vp->sp[nargs];
(rr) p sp
$61 = (union scm_vm_stack_element *) 0x7ff7f838dd90
(rr) p fp
$62 = (union scm_vm_stack_element *) 0x7ff7f838ddb0
(rr) p &sp[2]
$63 = (union scm_vm_stack_element *) 0x7ff7f838dda0
(rr) p vp->sp
$64 = (union scm_vm_stack_element *) 0x7ff7f838dcf0
(rr) p vp->fp
$65 = (union scm_vm_stack_element *) 0x7ff7f838dd08
(rr) p vp->stack_bottom
$66 = (union scm_vm_stack_element *) 0x7ff7f838a000
(rr) p vp->stack_top
$67 = (union scm_vm_stack_element *) 0x7ff7f838e000
--8<---------------cut here---------------end--------------->8---
Comments about this analysis?
How do we fix it? It’s a bit troubling that this is all lock-free. A
fix I can think of is to just re-do the sp[nargs] assignments after the
vp->sp etc. assignments.
Thoughts?
Ludo’.
next prev parent reply other threads:[~2020-03-12 22:00 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-23 22:20 bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Ludovic Courtès
2017-08-23 22:48 ` Ludovic Courtès
2018-04-24 16:03 ` Ludovic Courtès
2018-05-08 21:55 ` Ludovic Courtès
2018-05-09 0:32 ` Mark H Weaver
2018-05-09 7:17 ` Ludovic Courtès
2018-05-09 9:11 ` Andy Wingo
2018-05-10 6:50 ` Mark H Weaver
2018-05-10 7:53 ` Andy Wingo
2018-06-29 15:03 ` bug#28211: Stack marking issue in multi-threaded code Ludovic Courtès
2018-06-29 16:54 ` Mark H Weaver
2018-06-29 21:18 ` Ludovic Courtès
2018-06-29 23:18 ` Mark H Weaver
2018-06-30 20:53 ` Ludovic Courtès
2018-06-30 21:49 ` Mark H Weaver
2018-07-01 10:12 ` Andy Wingo
2018-07-03 19:01 ` Mark H Weaver
2020-03-12 21:59 ` Ludovic Courtès [this message]
2020-03-13 22:38 ` bug#28211: Stack marking issue in multi-threaded code, 2020 edition Ludovic Courtès
2020-03-17 21:16 ` Andy Wingo
2018-05-10 15:48 ` bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Mark H Weaver
2018-05-10 16:01 ` Mark H Weaver
2018-07-02 10:28 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tv2tp74g.fsf_-_@gnu.org \
--to=ludo@gnu.org \
--cc=28211@debbugs.gnu.org \
--cc=wingo@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).