From: ludo@gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw@netris.org>
Cc: Andy Wingo <wingo@igalia.com>, 28211@debbugs.gnu.org
Subject: bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2
Date: Wed, 09 May 2018 09:17:35 +0200 [thread overview]
Message-ID: <87zi19z4a8.fsf@gnu.org> (raw)
In-Reply-To: <87d0y5k6sl.fsf@netris.org> (Mark H. Weaver's message of "Tue, 08 May 2018 20:32:26 -0400")
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> ludo@gnu.org (Ludovic Courtès) writes:
> [...]
>> Thread 1 (Thread 0x7f6fe6f5d700 (LWP 2856)):
>> #0 0x00007f7019db0d79 in scm_is_pair (x=<error reading variable: ERROR: Cannot access memory at address 0x0>0x0) at ../libguile/pairs.h:159
>> #1 scm_ilength (sx=<optimized out>) at list.c:190
> [...]
>> What this means is that Thread 1 gets NULL instead of a list as its
>> on-stack argument (vm-engine.c:909 is ‘tail-apply’).
>>
>> How can arguments on the VM stack be zeroed?
>
> I doubt that's what happened, because I expect that each VM stack is
> dedicated to a single hardware thread. In theory, if a single VM stack
> is used by one thread, and then later used by another thread,
> thread-safety issues on the VM stack could arise in the absense of
> proper thread synchronization.
>
> However, I think it's _far_ more likely that the NULL argument on the
> stack was copied from memory shared by multiple threads without proper
> thread synchronization.
It could be this, but this particular case is an “embarrassingly
parallel” program where threads work on independent data sets without
any inter-thread communication whatsoever.
What you describe could nevertheless be happening at a lower level,
within libguile, though it’s not clear to me where that could be.
>> I commented out the MADV_DONTNEED call to be sure, but I can still
>> reproduce the bug.
>
> I strongly doubt that the MADV_DONTNEED is relevant to this issue.
I thought about it because that’s one way part of the VM stack could be
zeroed out.
>> Then I thought vp->sp might be out-of-sync compared to the local
>> variable ‘sp’, which in turn could cause ‘scm_i_vm_mark_stack’ to not
>> mark a few items on the tip of the stack. So I did this:
>>
>> diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
>> index 9509cd643..1136b2271 100644
>> --- a/libguile/vm-engine.c
>> +++ b/libguile/vm-engine.c
>> @@ -151,7 +151,8 @@
>> code, or otherwise push anything on the stack, you will need to
>> CACHE_SP afterwards to restore the possibly-changed stack pointer. */
>>
>> -#define SYNC_IP() vp->ip = (ip)
>> +#define SYNC_IP() \
>> + do { vp->ip = (ip); vp->sp = (sp); } while (0)
>
> I don't see how a change like this could be useful for any thread safety
> problem.
I witnessed situations where the local ‘sp’ seemed to be different from
‘vp->sp’, though it’s hard to tell because I’m unsure where gcc stores
‘sp’. Here’s an example:
--8<---------------cut here---------------start------------->8---
(gdb) frame
#16 0x00007fabf30af2ca in vm_regular_engine (thread=0x24e6000, vp=0x22de6c0, registers=0x0, resume=40) at vm-engine.c:785
785 ret = scm_apply_subr (sp, FRAME_LOCALS_COUNT ());
(gdb) p vp->sp
$5 = (union scm_vm_stack_element *) 0x7fabec158718
(gdb) p (union scm_vm_stack_element *) $r13
$6 = (union scm_vm_stack_element *) 0x7fabec158e30
(gdb) p $6 - $5
$7 = 227
(gdb) p vp->fp
$8 = (union scm_vm_stack_element *) 0x7fabec158730
(gdb) p vp->stack_top
$9 = (union scm_vm_stack_element *) 0x7fabec159000
(gdb) p vp->stack_bottom
$10 = (union scm_vm_stack_element *) 0x7fabec158000
(gdb) p vp->sp_min_since_gc
$11 = (union scm_vm_stack_element *) 0x7fabec158620
(gdb) info registers
rax 0x1 1
rbx 0xa 10
rcx 0x28 40
rdx 0x0 0
rsi 0x23f1920 37689632
rdi 0x24e6000 38690816
rbp 0x22de6c0 0x22de6c0
rsp 0x7fabcce18660 0x7fabcce18660
r8 0x1 1
r9 0x1 1
r10 0x100 256
r11 0x23f1920 37689632
r12 0x7fabf330b8c0 140376496191680
r13 0x7fabec158e30 140376376970800
r14 0x7fabf30c6d7c 140376493813116
r15 0x7fabf0fa7f28 140376459083560
rip 0x7fabf30af2ca 0x7fabf30af2ca <vm_regular_engine+14058>
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
--8<---------------cut here---------------end--------------->8---
My hypothesis was that such a bug could lead heap elements to be
reclaimed too early. This is more likely to happen in a multi-threaded
context because one thread could be allocating memory and triggering a
GC while another thread is invoking a subr with an out-of-sync ‘vp->sp’.
Does that make sense?
> For now, I would suggest avoiding multi-threaded code in Guix, or at
> least to avoid loading any Scheme code from multiple threads.
>
> How about using multiple processes instead?
We could do that, but with my Guile maintainer hat on (a hat I don’t
wear that often as you might have noticed ;-)) I think it would be nice
to fix the issue.
Thanks,
Ludo’.
next prev parent reply other threads:[~2018-05-09 7:19 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-23 22:20 bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Ludovic Courtès
2017-08-23 22:48 ` Ludovic Courtès
2018-04-24 16:03 ` Ludovic Courtès
2018-05-08 21:55 ` Ludovic Courtès
2018-05-09 0:32 ` Mark H Weaver
2018-05-09 7:17 ` Ludovic Courtès [this message]
2018-05-09 9:11 ` Andy Wingo
2018-05-10 6:50 ` Mark H Weaver
2018-05-10 7:53 ` Andy Wingo
2018-06-29 15:03 ` bug#28211: Stack marking issue in multi-threaded code Ludovic Courtès
2018-06-29 16:54 ` Mark H Weaver
2018-06-29 21:18 ` Ludovic Courtès
2018-06-29 23:18 ` Mark H Weaver
2018-06-30 20:53 ` Ludovic Courtès
2018-06-30 21:49 ` Mark H Weaver
2018-07-01 10:12 ` Andy Wingo
2018-07-03 19:01 ` Mark H Weaver
2020-03-12 21:59 ` bug#28211: Stack marking issue in multi-threaded code, 2020 edition Ludovic Courtès
2020-03-13 22:38 ` Ludovic Courtès
2020-03-17 21:16 ` Andy Wingo
2018-05-10 15:48 ` bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Mark H Weaver
2018-05-10 16:01 ` Mark H Weaver
2018-07-02 10:28 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zi19z4a8.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=28211@debbugs.gnu.org \
--cc=mhw@netris.org \
--cc=wingo@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).