unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw@netris.org>
Cc: Andy Wingo <wingo@igalia.com>, 28211@debbugs.gnu.org
Subject: bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2
Date: Wed, 09 May 2018 09:17:35 +0200	[thread overview]
Message-ID: <87zi19z4a8.fsf@gnu.org> (raw)
In-Reply-To: <87d0y5k6sl.fsf@netris.org> (Mark H. Weaver's message of "Tue, 08 May 2018 20:32:26 -0400")

Hi Mark,

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
> [...]
>> Thread 1 (Thread 0x7f6fe6f5d700 (LWP 2856)):
>> #0  0x00007f7019db0d79 in scm_is_pair (x=<error reading variable: ERROR: Cannot access memory at address 0x0>0x0) at ../libguile/pairs.h:159
>> #1  scm_ilength (sx=<optimized out>) at list.c:190
> [...]
>> What this means is that Thread 1 gets NULL instead of a list as its
>> on-stack argument (vm-engine.c:909 is ‘tail-apply’).
>>
>> How can arguments on the VM stack be zeroed?
>
> I doubt that's what happened, because I expect that each VM stack is
> dedicated to a single hardware thread.  In theory, if a single VM stack
> is used by one thread, and then later used by another thread,
> thread-safety issues on the VM stack could arise in the absense of
> proper thread synchronization.
>
> However, I think it's _far_ more likely that the NULL argument on the
> stack was copied from memory shared by multiple threads without proper
> thread synchronization.

It could be this, but this particular case is an “embarrassingly
parallel” program where threads work on independent data sets without
any inter-thread communication whatsoever.

What you describe could nevertheless be happening at a lower level,
within libguile, though it’s not clear to me where that could be.

>> I commented out the MADV_DONTNEED call to be sure, but I can still
>> reproduce the bug.
>
> I strongly doubt that the MADV_DONTNEED is relevant to this issue.

I thought about it because that’s one way part of the VM stack could be
zeroed out.

>> Then I thought vp->sp might be out-of-sync compared to the local
>> variable ‘sp’, which in turn could cause ‘scm_i_vm_mark_stack’ to not
>> mark a few items on the tip of the stack.  So I did this:
>>
>> diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
>> index 9509cd643..1136b2271 100644
>> --- a/libguile/vm-engine.c
>> +++ b/libguile/vm-engine.c
>> @@ -151,7 +151,8 @@
>>     code, or otherwise push anything on the stack, you will need to
>>     CACHE_SP afterwards to restore the possibly-changed stack pointer.  */
>>  
>> -#define SYNC_IP() vp->ip = (ip)
>> +#define SYNC_IP()                                       \
>> +  do { vp->ip = (ip); vp->sp = (sp); } while (0)
>
> I don't see how a change like this could be useful for any thread safety
> problem.

I witnessed situations where the local ‘sp’ seemed to be different from
‘vp->sp’, though it’s hard to tell because I’m unsure where gcc stores
‘sp’.  Here’s an example:

--8<---------------cut here---------------start------------->8---
(gdb) frame
#16 0x00007fabf30af2ca in vm_regular_engine (thread=0x24e6000, vp=0x22de6c0, registers=0x0, resume=40) at vm-engine.c:785
785	      ret = scm_apply_subr (sp, FRAME_LOCALS_COUNT ());
(gdb) p vp->sp
$5 = (union scm_vm_stack_element *) 0x7fabec158718
(gdb) p (union scm_vm_stack_element *) $r13
$6 = (union scm_vm_stack_element *) 0x7fabec158e30
(gdb) p $6 - $5
$7 = 227
(gdb) p vp->fp
$8 = (union scm_vm_stack_element *) 0x7fabec158730
(gdb) p vp->stack_top
$9 = (union scm_vm_stack_element *) 0x7fabec159000
(gdb) p vp->stack_bottom
$10 = (union scm_vm_stack_element *) 0x7fabec158000
(gdb) p vp->sp_min_since_gc
$11 = (union scm_vm_stack_element *) 0x7fabec158620
(gdb) info registers
rax            0x1	1
rbx            0xa	10
rcx            0x28	40
rdx            0x0	0
rsi            0x23f1920	37689632
rdi            0x24e6000	38690816
rbp            0x22de6c0	0x22de6c0
rsp            0x7fabcce18660	0x7fabcce18660
r8             0x1	1
r9             0x1	1
r10            0x100	256
r11            0x23f1920	37689632
r12            0x7fabf330b8c0	140376496191680
r13            0x7fabec158e30	140376376970800
r14            0x7fabf30c6d7c	140376493813116
r15            0x7fabf0fa7f28	140376459083560
rip            0x7fabf30af2ca	0x7fabf30af2ca <vm_regular_engine+14058>
eflags         0x10246	[ PF ZF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
--8<---------------cut here---------------end--------------->8---

My hypothesis was that such a bug could lead heap elements to be
reclaimed too early.  This is more likely to happen in a multi-threaded
context because one thread could be allocating memory and triggering a
GC while another thread is invoking a subr with an out-of-sync ‘vp->sp’.

Does that make sense?

> For now, I would suggest avoiding multi-threaded code in Guix, or at
> least to avoid loading any Scheme code from multiple threads.
>
> How about using multiple processes instead?

We could do that, but with my Guile maintainer hat on (a hat I don’t
wear that often as you might have noticed ;-)) I think it would be nice
to fix the issue.

Thanks,
Ludo’.

  reply	other threads:[~2018-05-09  7:19 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23 22:20 bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Ludovic Courtès
2017-08-23 22:48 ` Ludovic Courtès
2018-04-24 16:03 ` Ludovic Courtès
2018-05-08 21:55   ` Ludovic Courtès
2018-05-09  0:32     ` Mark H Weaver
2018-05-09  7:17       ` Ludovic Courtès [this message]
2018-05-09  9:11       ` Andy Wingo
2018-05-10  6:50         ` Mark H Weaver
2018-05-10  7:53           ` Andy Wingo
2018-06-29 15:03             ` bug#28211: Stack marking issue in multi-threaded code Ludovic Courtès
2018-06-29 16:54               ` Mark H Weaver
2018-06-29 21:18                 ` Ludovic Courtès
2018-06-29 23:18                   ` Mark H Weaver
2018-06-30 20:53                     ` Ludovic Courtès
2018-06-30 21:49                       ` Mark H Weaver
2018-07-01 10:12                         ` Andy Wingo
2018-07-03 19:01                           ` Mark H Weaver
2020-03-12 21:59               ` bug#28211: Stack marking issue in multi-threaded code, 2020 edition Ludovic Courtès
2020-03-13 22:38                 ` Ludovic Courtès
2020-03-17 21:16                 ` Andy Wingo
2018-05-10 15:48     ` bug#28211: Grafting code triggers GC/thread-safety issue on Guile 2.2.2 Mark H Weaver
2018-05-10 16:01       ` Mark H Weaver
2018-07-02 10:28 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zi19z4a8.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=28211@debbugs.gnu.org \
    --cc=mhw@netris.org \
    --cc=wingo@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).