From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#28211: Stack marking issue in multi-threaded code, 2020 edition Date: Fri, 13 Mar 2020 23:38:01 +0100 Message-ID: <87blozkhiu.fsf@gnu.org> References: <877exuj58y.fsf@gnu.org> <87d0yo1tie.fsf@gnu.org> <87fu3124nt.fsf@gnu.org> <87d0y5k6sl.fsf@netris.org> <871sel6vnq.fsf@igalia.com> <87fu30dmx3.fsf@netris.org> <87tvrg3q1d.fsf@igalia.com> <87a7rdvdm9.fsf_-_@gnu.org> <87tv2tp74g.fsf_-_@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:49235) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jCsxL-0003D1-UH for bug-guix@gnu.org; Fri, 13 Mar 2020 18:39:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jCsxK-0008LI-LV for bug-guix@gnu.org; Fri, 13 Mar 2020 18:39:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:54364) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jCsxK-0008Kk-Hi for bug-guix@gnu.org; Fri, 13 Mar 2020 18:39:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87tv2tp74g.fsf_-_@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\?\= \=\?utf-8\?Q\?\=22's\?\= message of "Thu, 12 Mar 2020 22:59:11 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane-mx.org@gnu.org Sender: "bug-Guix" To: Andy Wingo Cc: 28211@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, Ludovic Court=C3=A8s skribis: > What if =E2=80=98scm_i_vm_mark_stack=E2=80=99 walks the stack right befor= e the =E2=80=98vp->fp=E2=80=99 > assignment? It can determine that one of the just-assigned =E2=80=98sp[n= args]=E2=80=99 > is a dead slot, and thus set it to SCM_UNSPECIFIED. Later, when we set > =E2=80=98vp->fp=E2=80=99, that stack slot that we just initialized has be= en overwritten > by =E2=80=98scm_i_vm_mark_stack=E2=80=99. Down the road, we get somethin= g like: > > Wrong type to apply: # I believe the patch below solves this problem. Also attached is a small program that reproduces the bug (without the patch) after a couple of minutes. Thoughts? The full grafting test case exposes another VM stack-corruption-looking crash that I=E2=80=99m still investigating. Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/libguile/vm.c b/libguile/vm.c index b20c6eb5f..5cc934e99 100644 --- a/libguile/vm.c +++ b/libguile/vm.c @@ -1351,7 +1351,8 @@ scm_i_vm_emergency_abort (SCM *tag_and_argv, size_t n) scm_t_bits *prompt; scm_t_dynstack_prompt_flags flags; ptrdiff_t fp_offset, sp_offset; - union scm_vm_stack_element *fp, *sp; + union scm_vm_stack_element *fp; + volatile union scm_vm_stack_element *sp; SCM *argv; uint32_t *vra; uint8_t *mra; @@ -1392,16 +1393,19 @@ scm_i_vm_emergency_abort (SCM *tag_and_argv, size_t n) to be jumping to an older part of the stack. */ if (sp < vp->sp) abort (); - sp[nargs].as_scm = cont; - - while (nargs--) - sp[nargs].as_scm = *argv++; /* Restore VM regs */ vp->fp = fp; - vp->sp = sp; + vp->sp = (union scm_vm_stack_element *) sp; vp->ip = vra; + /* Restore the arguments on SP. This must be done after 'vp->fp' has + been set so that a concurrent 'scm_i_vm_mark_stack' does not + overwrite it (see ). */ + sp[nargs].as_scm = cont; + while (nargs--) + sp[nargs].as_scm = *argv++; + /* Jump! */ vp->mra_after_abort = mra; longjmp (*registers, 1); @@ -1417,7 +1421,8 @@ abort_to_prompt (scm_thread *thread, uint8_t *saved_mra) scm_t_bits *prompt; scm_t_dynstack_prompt_flags flags; ptrdiff_t fp_offset, sp_offset; - union scm_vm_stack_element *fp, *sp; + union scm_vm_stack_element *fp, *orig_sp; + volatile union scm_vm_stack_element *sp; uint32_t *vra; uint8_t *mra; jmp_buf *registers; @@ -1452,6 +1457,7 @@ abort_to_prompt (scm_thread *thread, uint8_t *saved_mra) /* Recompute FP, as scm_dynstack_unwind may have expanded the stack. */ fp = vp->stack_top - fp_offset; sp = vp->stack_top - sp_offset; + orig_sp = vp->sp; /* Continuation gets nargs+1 values: the one more is for the cont. */ sp = sp - nargs - 1; @@ -1460,15 +1466,19 @@ abort_to_prompt (scm_thread *thread, uint8_t *saved_mra) to be jumping to an older part of the stack. */ if (sp < vp->sp) abort (); - sp[nargs].as_scm = cont; - while (nargs--) - sp[nargs] = vp->sp[nargs]; /* Restore VM regs */ vp->fp = fp; - vp->sp = sp; + vp->sp = (union scm_vm_stack_element *) sp; vp->ip = vra; + /* Restore the arguments on SP. This must be done after 'vp->fp' has + been set so that a concurrent 'scm_i_vm_mark_stack' does not + overwrite it (see ). */ + sp[nargs].as_scm = cont; + while (nargs--) + sp[nargs] = orig_sp[nargs]; + /* If there are intervening C frames, then jump over them, making a nonlocal exit. Otherwise fall through and let the VM pick up where it left off. */ --=-=-= Content-Type: text/plain Content-Disposition: inline; filename=grafts-crash.reduced.scm Content-Description: the reproducer ;; https://issues.guix.gnu.org/issue/28211 (use-modules (ice-9 threads)) (define (thunk) (catch 'foo (lambda () (throw 'foo (iota 10))) (lambda (key lst) (unless (equal? lst (iota 10)) (primitive-_exit 42))))) (n-par-for-each (* 2 (current-processor-count)) (lambda _ (let loop () (thunk) (loop))) (iota 1000)) --=-=-=--